Dandandan commented on pull request #706: URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877844585
> > One concern I have is that the current config also sets the number of maximum threads during reading parquet files. > > Is this still true though? I know we were creating threads at one point in time but we are using Tokio/async now, so we are not creating threads. Increasing partition count will increase the number of async tasks that we run in the thread pool but won't increase the number of threads. We run the tasks now with `spawn_blocking`, this will still create a number of extra threads to execute the task on. This is set to create a maximum of 512(!) threads by default. Based on the `max_concurrency` we still split the files into multiple parallel readers, so increasing this value will increase the number of extra threads (and allocated data) we use considerably as far as I can see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
