devinjdangelo commented on issue #7892: URL: https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1976476717
I revisited this on the theory that #9276 fixed it as a side effect. I was wrong and it is still an issue. ```sql ❯ create external table test(partition varchar, trace_id varchar) stored as parquet partitioned by (partition) location '/tmp/test/'; 0 rows in set. Query took 0.001 seconds. ❯ insert into test select *from 'input.parquet'; #(runs for a very long time and uses wrong column for partitioning) ❯ insert into test select trace_id, partition from 'input.parquet'; +----------+ | count | +----------+ | 15557151 | +----------+ 1 row in set. Query took 1.501 seconds. ``` As shown above, it seems that the order of the columns in the schema affects whether the result is correct. I think we will need to look into the logic which aligns the schema of the table vs. the stream of data which should be written to the table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
