[GitHub] [arrow-datafusion] Dandandan commented on pull request #706: Rename concurrency to default_partitions

GitBox Sun, 11 Jul 2021 11:46:35 -0700


Dandandan commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877844585



   > > One concern I have is that the current config also sets the number of 
maximum threads during reading parquet files.
   > 
   > Is this still true though? I know we were creating threads at one point in 
time but we are using Tokio/async now, so we are not creating threads. 
Increasing partition count will increase the number of async tasks that we run 
in the thread pool but won't increase the number of threads.
   
   We run the tasks now with `spawn_blocking`, this will still create a number 
of extra threads to execute the task on. This is set to create a maximum of 
512(!) threads by default. Based on the `max_concurrency` we still split the 
files into multiple parallel readers, so increasing this value will increase 
the number of extra threads (and allocated data) we use considerably as far as 
I can see.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #706: Rename concurrency to default_partitions

Reply via email to