devinjdangelo opened a new issue, #9465:
URL: https://github.com/apache/arrow-datafusion/issues/9465

   ### Is your feature request related to a problem or challenge?
   
   Datafusions SQL dialect declares partitioned tables like this:
   
   ```sql
   CREATE EXTERNAL TABLE(partition varchar, trace_id varchar) 
   STORED AS parquet
   PARTITIONED BY (partition)
   LOCATION '/tmp/test/';
   ```
   
   Note that the `partition` column is declared with the other columns and 
again later in the `PARTITIONED BY` clause. Internally, Datafusion reorders 
table schemas so that partition columns come at the end, which is a common 
convention. This leads to confusing examples like #7892 and the following
   
   ```sql
   DataFusion CLI v36.0.0
   ❯ create external table test(partition varchar, trace_id varchar) stored as 
parquet partitioned by (partition) location '/tmp/test/';
   0 rows in set. Query took 0.001 seconds.
   
   ❯ insert into test values ('a','x'),('b','y'),('c','z');
   +-------+
   | count |
   +-------+
   | 3     |
   +-------+
   1 row in set. Query took 0.016 seconds.
   
   ❯ select * from test;
   +----------+-----------+
   | trace_id | partition |
   +----------+-----------+
   | a        | x         |
   | c        | z         |
   | b        | y         |
   +----------+-----------+
   3 rows in set. Query took 0.002 seconds.
   ```
   
   Since you declared the order as (partition varchar, trace_id varchar) you 
would expect this order to be respected when inserting data, but instead it is 
silently reordered so that the partition column comes at the end. 
   
   ### Describe the solution you'd like
   
   Rework `CREATE EXTERNAL TABLE` syntax to only allow partition by columns to 
be declared in the partitioned by clause. The above example then becomes:
   
   ```sql
   CREATE EXTERNAL TABLE(trace_id varchar) 
   STORED AS parquet
   PARTITIONED BY (partition varchar)
   LOCATION '/tmp/test/';
   ```
   
   This leaves much less room for confusion about the ordering of the columns 
when inserting values.
   
   ### Describe alternatives you've considered
   
   We could instead drop the convention of moving the partitioned by columns to 
the end of the schema and respect the ordering of columns that the user 
declares. 
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to