devinjdangelo commented on issue #1718: URL: https://github.com/apache/arrow-rs/issues/1718#issuecomment-1709219407
The other thing that might be a challenge if we go the concatenation route is that DataFusion writers preserve input ordering (in the sense that if you run "COPY (select * from my_table order by my_col) to my_file.parquet" then my_file.parquet should be sorted according to the input query). If we construct multiple parquet files/concat, they would need to be deserialized and sorted again which would defeat any possibility of a speed up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
