[PR] feat: Use file cache to list partitions if available [arrow-datafusion]

via GitHub Sun, 17 Mar 2024 11:16:20 -0700


henrifroese opened a new pull request, #9655:
URL: https://github.com/apache/arrow-datafusion/pull/9655


   When discovering partitions for pruning, if we specify no partition columns, 
we call `list_all_files`, which uses the `list_files_cache` if it exists and is 
filled.
   
   If we specify partition columns, before this change, we recursively list 
files in the object store to discover partitions. That happens on every 
request, and listing files e.g. in AWS S3 can be slow (especially if it's 
100k+).
   
   With this change, if the `list_files_cache` exists and is filled, we get all 
files from there and use that to discover partitions.
   
   Closes #9654.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat: Use file cache to list partitions if available [arrow-datafusion]

Reply via email to