devinjdangelo opened a new issue, #9465:
URL: https://github.com/apache/arrow-datafusion/issues/9465
### Is your feature request related to a problem or challenge?
Datafusions SQL dialect declares partitioned tables like this:
```sql
CREATE EXTERNAL TABLE(partition varchar, trace_id varchar)
STORED AS parquet
PARTITIONED BY (partition)
LOCATION '/tmp/test/';
```
Note that the `partition` column is declared with the other columns and
again later in the `PARTITIONED BY` clause. Internally, Datafusion reorders
table schemas so that partition columns come at the end, which is a common
convention. This leads to confusing examples like #7892 and the following
```sql
DataFusion CLI v36.0.0
❯ create external table test(partition varchar, trace_id varchar) stored as
parquet partitioned by (partition) location '/tmp/test/';
0 rows in set. Query took 0.001 seconds.
❯ insert into test values ('a','x'),('b','y'),('c','z');
+-------+
| count |
+-------+
| 3 |
+-------+
1 row in set. Query took 0.016 seconds.
❯ select * from test;
+----------+-----------+
| trace_id | partition |
+----------+-----------+
| a | x |
| c | z |
| b | y |
+----------+-----------+
3 rows in set. Query took 0.002 seconds.
```
Since you declared the order as (partition varchar, trace_id varchar) you
would expect this order to be respected when inserting data, but instead it is
silently reordered so that the partition column comes at the end.
### Describe the solution you'd like
Rework `CREATE EXTERNAL TABLE` syntax to only allow partition by columns to
be declared in the partitioned by clause. The above example then becomes:
```sql
CREATE EXTERNAL TABLE(trace_id varchar)
STORED AS parquet
PARTITIONED BY (partition varchar)
LOCATION '/tmp/test/';
```
This leaves much less room for confusion about the ordering of the columns
when inserting values.
### Describe alternatives you've considered
We could instead drop the convention of moving the partitioned by columns to
the end of the schema and respect the ordering of columns that the user
declares.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]