Tushar7012 commented on issue #19971: URL: https://github.com/apache/datafusion/issues/19971#issuecomment-3796243862
Thanks for the assignment and the pointer to PR #19969! I've reviewed your `infer_schema` parallelization approach and understand the pattern now - using tokio spawning directly within the function rather than a multi-layered solution. For `list_files_for_scan`, I'll follow the same approach: 1. Spawn parallel tasks within the function to list files concurrently 2. Use `JoinSet` or similar to collect results 3. Keep the existing API surface unchanged I'll also note that benchmarks may not capture the improvement well since the gain is primarily on cold start / first query (before caching kicks in). I'll start working on a draft PR following this pattern. Let me know if there's anything specific I should be aware of for the `list_files_for_scan` case! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
