henrifroese opened a new pull request, #9655: URL: https://github.com/apache/arrow-datafusion/pull/9655
When discovering partitions for pruning, if we specify no partition columns, we call `list_all_files`, which uses the `list_files_cache` if it exists and is filled. If we specify partition columns, before this change, we recursively list files in the object store to discover partitions. That happens on every request, and listing files e.g. in AWS S3 can be slow (especially if it's 100k+). With this change, if the `list_files_cache` exists and is filled, we get all files from there and use that to discover partitions. Closes #9654. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
