devinjdangelo commented on issue #7892:
URL:
https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1978707054
> I don't think SQL aligns on field name, but instead only on position
>
> I think the two cases are
>
> INSERT INTO bar SELECT .... in which case the columns of the select
list should be inserted into the table columns by position. If there are more
columns in the table than in the select list, the remaining ones should be null
padded
> INSERT INTO bar(cols) SELECT ... in which case the columns of the
select list should be inserted into the list of columns by position. If there
are more columns in the table than in the select list, the remaining ones
should be null padded
In that case, the existing insert logical planning and partitioned writes
are behaving correctly. The reproducer in this issue is correct behavior which
is confusing due to `CREATE EXTERNAL TABLE` reordering the schema by moving the
partition column to the end. I believe this sequence of commands shows the
problem clearly.
```
DataFusion CLI v36.0.0
❯ create external table test(partition varchar, trace_id varchar) stored as
parquet partitioned by (partition) location '/tmp/test/';
0 rows in set. Query took 0.001 seconds.
❯ insert into test values ('a','x'),('b','y'),('c','z');
+-------+
| count |
+-------+
| 3 |
+-------+
1 row in set. Query took 0.016 seconds.
❯ select * from test;
+----------+-----------+
| trace_id | partition |
+----------+-----------+
| a | x |
| c | z |
| b | y |
+----------+-----------+
3 rows in set. Query took 0.002 seconds.
❯ select * from 'input.parquet' limit 5;
+-----------+----------------------------------+
| partition | trace_id |
+-----------+----------------------------------+
| b | 0000000000000000353b8c80d0183941 |
| b | 0000000000000000353b8c80d0183941 |
| b | 0000000000000000353b8c80d0183941 |
| b | 0000000000000000353b8c80d0183941 |
| b | 0000000000000000353b8c80d0183941 |
+-----------+----------------------------------+
5 rows in set. Query took 0.013 seconds.
```
Note that the table schema and the parquet schema are not aligned by
position due to ListingTable moving partition columns to the end.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]