[GitHub] [arrow-datafusion] Dandandan commented on issue #2928: bug: changing the number of partitions does not increase concurrency

GitBox Sat, 16 Jul 2022 10:21:26 -0700


Dandandan commented on issue #2928:
URL: 
https://github.com/apache/arrow-datafusion/issues/2928#issuecomment-1186244765


   For reading csv, DataFusion reads the file sequentially. So setting the 
config on target partitions has limited effect on simple queries as reading CSV 
takes most of the time. Also by default it will use the number of logical cores 
available in the system.
   For this query, I expect you will get faster results by splitting the CSV 
into 8 equal smaller CSV's (and, if it's more than testing, converting to 
parquet directly too).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on issue #2928: bug: changing the number of partitions does not increase concurrency

Reply via email to