GuangFancui(ISCAS) created SPARK-16217:
------------------------------------------
Summary: Support SELECT INTO statement
Key: SPARK-16217
URL: https://issues.apache.org/jira/browse/SPARK-16217
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.0
Reporter: GuangFancui(ISCAS)
The *SELECT INTO* statement selects data from one table and inserts it into a
new table as follows.
{code:sql}
SELECT column_name(s)
INTO newtable
FROM table1;
{code}
This statement is commonly used in SQL but not currently supported in SparkSQL.
We investigated the Catalyst and found that this statement can be implemented
by improving the grammar and reusing the logical plan of *CREAT TABLE AS
SELECT* as follows.
# Improve grammar: Add _INTO tableIdentifier_ to _SELECT ... FROM_ in
_querySpecification_ grammar in SqlBase.g4 file.
!https://raw.githubusercontent.com/wuxianxingkong/storage/master/selectinto_g4.png!
For example
{code:sql}
SELECT *
INTO NEW_TABLE
FROM OLD_TABLE
{code}
Then the grammar tree will be:
!https://raw.githubusercontent.com/wuxianxingkong/storage/master/selectinto_tree.png!
Furthermore, we can argue whether it's necessary to add _INTO_ to _TRANSFORM_
in _querySpecification_
# Add new logicalplan: _SelectIntoLogicalPlan_
# Identify _SELECT INTO_ in the Parser: Modify _withQuerySpecification_
function. If there is _INTO tableIdentifier_, we can change it to
_SelectIntoLogicalPlan_ with _withSelectInto_(custom function).
# Conversion in Analyzer: Convert _SelectIntoLogicalPlan_ to
_CreateTableAsSelectCommand_ by adding a rule in the Analyzer.
*Hive support* should be opened since _CreateTableAsSelectCommand_ relies on it.
We’ve implemented and tested the above approach. If possible, we can make some
pull requests.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]