geoffreyclaude opened a new pull request, #14918: URL: https://github.com/apache/datafusion/pull/14918
## Which issue does this PR close? - Closes #14916. ## Rationale for this change When scanning an exact list of remote Parquet files, the ListingTable was fetching file metadata (via head calls) sequentially. This was due to using `stream::iter(file_list).flatten()`, which processes each one-item stream in order. For remote blob stores, where each head call can take tens to hundreds of milliseconds, this sequential behavior significantly increased the time to create the physical plan. ## What changes are included in this PR? This commit replaces the sequential flattening with concurrent merging using `futures::stream::select_all(file_list)`. With this change, the `head` requests are executed in parallel (up to the configured `meta_fetch_concurrency` limit), reducing latency when creating the physical plan. ## Are these changes tested? Tests have been updated to ensure that metadata fetching occurs concurrently. ## Are there any user-facing changes? No user-facing changes besides reducing the latency in this particular situation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org