[
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058333#comment-17058333
]
Jungtaek Lim commented on SPARK-31136:
--------------------------------------
This reminds me about my previous PR:
[https://github.com/apache/spark/pull/27107]
Please go through the comments in the PR again. I'm quoting the key point here:
{quote}The parts differentiating between two syntaxes are skewSpec, rowFormat,
and createFileFormat (using any of them would make create statement go into 2nd
syntax), and all of them are optional. We're not enforcing to specify it but
rely on the parser.
{quote}
I think the parser implementation around CREATE TABLE brings ambiguity which is
not documented anywhere. It wasn't ambiguous because we forced to specify
STORED AS if it's not a Hive table. Now it's either default provider or Hive
according to which options are provided, which seems to be non-trivial to
reason about.
I feel this as the issue of "not breaking old behavior". The parser rule gets
pretty much complicated due to support legacy config. Not breaking anything
would make us be stuck eventually.
> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -----------------------------------------------------------------------------
>
> Key: SPARK-31136
> URL: https://issues.apache.org/jira/browse/SPARK-31136
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Dongjoon Hyun
> Priority: Blocker
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.061 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Time taken: 0.383 seconds
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.969 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables:
> `default`.`t`;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]