Yuvraj-cyborg commented on code in PR #19298:
URL: https://github.com/apache/datafusion/pull/19298#discussion_r2615502085
##########
datafusion/execution/src/cache/cache_manager.rs:
##########
@@ -64,9 +64,21 @@ pub struct FileStatisticsCacheEntry {
/// command on the local filesystem. This operation can be expensive,
/// especially when done over remote object stores.
///
+/// The cache key is always the table's base path, ensuring a stable cache key.
+/// The `Extra` type is `Option<Path>`, representing an optional prefix filter
+/// (relative to the table base path) for partition-aware lookups.
+///
+/// When `get_with_extra(key, Some(prefix))` is called:
+/// - The cache entry for `key` (table base path) is fetched
+/// - Results are filtered to only include files matching `key/prefix`
+/// - Filtered results are returned without making a storage call
+///
+/// This enables efficient partition pruning: a single cached listing of the
+/// full table can serve queries for any partition subset.
+///
/// See [`crate::runtime_env::RuntimeEnv`] for more details.
pub trait ListFilesCache:
- CacheAccessor<Path, Arc<Vec<ObjectMeta>>, Extra = ObjectMeta>
+ CacheAccessor<Path, Arc<Vec<ObjectMeta>>, Extra = Option<Path>>
Review Comment:
Fixed the previous change suggestions but this one I'm still in doubt
because of :
1. empty string behaviours.
2. In partition pruning, the absence of a filter (query full table) is
fundamentally different from filtering by a specific partition.
If these aren't the concerns then I can proceed to remove the ```Option<> ```
cc: @BlakeOrth
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]