Dandandan commented on issue #2928:
URL: 
https://github.com/apache/arrow-datafusion/issues/2928#issuecomment-1186244765

   For reading csv, DataFusion reads the file sequentially. So setting the 
config on target partitions has limited effect on simple queries as reading CSV 
takes most of the time. Also by default it will use the number of logical cores 
available in the system.
   For this query, I expect you will get faster results by splitting the CSV 
into 8 equal smaller CSV's (and, if it's more than testing, converting to 
parquet directly too).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to