devinjdangelo commented on issue #7892:
URL: 
https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1978707054

   > I don't think SQL aligns on field name, but instead only on position
   > 
   > I think the two cases are
   > 
   >     INSERT INTO bar SELECT .... in which case the columns of the select 
list should be inserted into the table columns by position. If there are more 
columns in the table than in the select list, the remaining ones should be null 
padded
   >     INSERT INTO bar(cols) SELECT ... in which case the columns of the 
select list should be inserted into the list of columns by position. If there 
are more columns in the table than in the select list, the remaining ones 
should be null padded
   
   In that case, the existing insert logical planning and partitioned writes 
are behaving correctly. The reproducer in this issue is correct behavior which 
is confusing due to `CREATE EXTERNAL TABLE` reordering the schema by moving the 
partition column to the end. I believe this sequence of commands shows the 
problem clearly. 
   
   ```
   DataFusion CLI v36.0.0
   ❯ create external table test(partition varchar, trace_id varchar) stored as 
parquet partitioned by (partition) location '/tmp/test/';
   0 rows in set. Query took 0.001 seconds.
   
   ❯ insert into test values ('a','x'),('b','y'),('c','z');
   +-------+
   | count |
   +-------+
   | 3     |
   +-------+
   1 row in set. Query took 0.016 seconds.
   
   ❯ select * from test;
   +----------+-----------+
   | trace_id | partition |
   +----------+-----------+
   | a        | x         |
   | c        | z         |
   | b        | y         |
   +----------+-----------+
   3 rows in set. Query took 0.002 seconds.
   
   ❯ select * from 'input.parquet' limit 5;
   +-----------+----------------------------------+
   | partition | trace_id                         |
   +-----------+----------------------------------+
   | b         | 0000000000000000353b8c80d0183941 |
   | b         | 0000000000000000353b8c80d0183941 |
   | b         | 0000000000000000353b8c80d0183941 |
   | b         | 0000000000000000353b8c80d0183941 |
   | b         | 0000000000000000353b8c80d0183941 |
   +-----------+----------------------------------+
   5 rows in set. Query took 0.013 seconds.
   ```
   
   Note that the table schema and the parquet schema are not aligned by 
position due to ListingTable moving partition columns to the end.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to