Personally, I think EXTERNAL is a special feture supported by Hive.
If Spark SQL want support it, only consider it for Hive.
We only unify `CREATE EXTERNAL TABLE in parser and check for unsupported data 
sources.










At 2020-10-06 22:06:28, "Wenchen Fan" <cloud0...@gmail.com> wrote:

Hi all,



I'd like to start a discussion thread about this topic, as it blocks an 
important feature that we target for Spark 3.1: unify the CREATE TABLE SQL 
syntax.


A bit more background for CREATE EXTERNAL TABLE: it's kind of a hidden feature 
in Spark for Hive compatibility.


When you write native CREATE TABLE syntax such as `CREATE EXTERNAL TABLE ... 
USING parquet`, the parser fails and tells you that EXTERNAL can't be specified.


When we write Hive CREATE TABLE syntax, the EXTERNAL can be specified if 
LOCATION clause or path option is present. For example, `CREATE EXTERNAL TABLE 
... STORED AS parquet` is not allowed as there is no LOCATION clause or path 
option. This is not 100% Hive compatible.


As we are unifying the CREATE TABLE SQL syntax, one problem is how to deal with 
CREATE EXTERNAL TABLE. We can keep it as a hidden feature as it was, or we can 
officially support it.


Please let us know your thoughts:
1. As an end-user, what do you expect CREATE EXTERNAL TABLE to do? Have you 
used it in production before? For what use cases?
2. As a catalog developer, how are you going to implement EXTERNAL TABLE? It 
seems to me that it only makes sense for file source, as the table directory 
can be managed. I'm not sure how to interpret EXTERNAL in catalogs like jdbc, 
cassandra, etc.


For more details, please refer to the long discussion in 
https://github.com/apache/spark/pull/28026


Thanks,
Wenchen

Reply via email to