[GitHub] [arrow-datafusion] hengfeiyang opened a new pull request, #7595: feat: Parallel collecting parquet files statistics #7573

via GitHub Mon, 18 Sep 2023 22:00:11 -0700


hengfeiyang opened a new pull request, #7595:
URL: https://github.com/apache/arrow-datafusion/pull/7595


   ## Which issue does this PR close?
   
   Implement #7573 
   
   ## Rationale for this change
   
   ## What changes are included in this PR?
   
   - [ ] Add option `execution.meta_fetch_concurrency` default is `CPU::num()`
   - [ ] Replace `SCHEMA_INFERENCE_CONCURRENCY` with option 
`meta_fetch_concurrency`
   - [ ] Implement parallel collecting parquet files statistics
   
   ## Are these changes tested?
   
   In my local to search for 60 parquet files from s3. Because of parallel 
collecting statistics, the search speed improved 30%, of course, my local 
network request s3 has high latency.
   
   ## Are there any user-facing changes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] hengfeiyang opened a new pull request, #7595: feat: Parallel collecting parquet files statistics #7573

Reply via email to