Tejas Patil created SPARK-17741:
-----------------------------------
Summary: Grammar to parse top level and nested data fields
separately
Key: SPARK-17741
URL: https://issues.apache.org/jira/browse/SPARK-17741
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.0
Reporter: Tejas Patil
Priority: Trivial
Based on discussion over the dev list:
{noformat}
Is there any reason why Spark SQL supports "<column name>" ":" "<data type>"
while specifying columns ?
eg. sql("CREATE TABLE t1 (column1:INT)") works fine.
Here is relevant snippet in the grammar [0]:
```
colType
: identifier ':'? dataType (COMMENT STRING)?
;
```
I do not see MySQL[1], Hive[2], Presto[3] and PostgreSQL [4] supporting ":"
while specifying columns.
They all use space as a delimiter.
[0] :
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L596
[1] : http://dev.mysql.com/doc/refman/5.7/en/create-table.html
[2] :
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
[3] : https://prestodb.io/docs/current/sql/create-table.html
[4] : https://www.postgresql.org/docs/9.1/static/sql-createtable.html
{noformat}
Herman's response:
{noformat}
This is because we use the same rule to parse top level and nested data fields.
For example:
create table tbl_x(
id bigint,
nested struct<col1:string,col2:string>
)
Shows both syntaxes. We should split this rule in a top-level and nested rule.
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]