Re: [I] writing to partitioned table uses the wrong column as partition key [arrow-datafusion]

via GitHub Mon, 04 Mar 2024 04:29:16 -0800


devinjdangelo commented on issue #7892:
URL: 
https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1976476717


   I revisited this on the theory that #9276 fixed it as a side effect. I was 
wrong and it is still an issue.
   
   ```sql
   ❯ create external table test(partition varchar, trace_id varchar) stored as 
parquet partitioned by (partition) location '/tmp/test/';
   0 rows in set. Query took 0.001 seconds.
   
   ❯ insert into test select *from 'input.parquet';
   #(runs for a very long time and uses wrong column for partitioning)
   
   ❯ insert into test select trace_id, partition from 'input.parquet';
   +----------+
   | count    |
   +----------+
   | 15557151 |
   +----------+
   1 row in set. Query took 1.501 seconds.
   ```
   
   As shown above, it seems that the order of the columns in the schema affects 
whether the result is correct. I think we will need to look into the logic 
which aligns the schema of the table vs. the stream of data which should be 
written to the table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] writing to partitioned table uses the wrong column as partition key [arrow-datafusion]

Reply via email to