Dandandan commented on issue #2928: URL: https://github.com/apache/arrow-datafusion/issues/2928#issuecomment-1186244765
For reading csv, DataFusion reads the file sequentially. So setting the config on target partitions has limited effect on simple queries as reading CSV takes most of the time. Also by default it will use the number of logical cores available in the system. For this query, I expect you will get faster results by splitting the CSV into 8 equal smaller CSV's (and, if it's more than testing, converting to parquet directly too). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
