sitano opened a new issue, #2928:
URL: https://github.com/apache/arrow-datafusion/issues/2928

   **Describe the bug**
   
   Changing the number of partitions has no positive effect on the execution.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. take 10 GB CSV file
   2. execute cli with 1 partition, it will take about 40 sec
   3. -- CREATE EXTERNAL TABLE test (...) STORED AS CSV WITH HEADER ROW 
LOCATION 'test.csv';
   4. -- SELECT SUM(total_amount) FROM test GROUP BY VendorID;
   5. execute with 8 partitions (or 1000) (I have 8 real cores CPU), it will 
take 38 sec.
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   At least some linear scalability per core number. For 8 parts = 40/8 ~ to be 
5 sec.
   
   **Additional context**
   ```
   let mut session_config = SessionConfig::new()
           .with_information_schema(true)
           .with_target_partitions(args.threads);
   ```
   
   maybe my patch to the CLI is wrong...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to