adriangb opened a new pull request, #19597: URL: https://github.com/apache/datafusion/pull/19597
## Which issue does this PR close? Part of #19433 ## Rationale for this change In preparation for ordering inference from Parquet metadata, we need to refactor the cache system to support storing both statistics AND ordering information together. The current cache API uses `Extra` type parameters and `get_with_extra`/`put_with_extra` methods which are awkward for this use case. This PR simplifies the cache API to: - Use `Path` as the cache key (instead of `ObjectMeta`) - Embed the `ObjectMeta` in the cached value for validation - Support storing ordering information alongside statistics ## What changes are included in this PR? ### Cache API changes (`cache_manager.rs`): - Add `CachedFileMetadata` struct with `meta`, `statistics`, and `ordering` fields - Refactor `FileStatisticsCache` trait to use `get(&Path)` / `put(&Path, CachedFileMetadata)` - Add `has_ordering` field to `FileStatisticsCacheEntry` - Add `CachedFileList` struct for list files cache - Refactor `FileMetadataCache` trait to use `CachedFileMetadataEntry` and Path keys ### Cache implementation changes: - Update `DefaultFileStatisticsCache` to use new trait methods - Update `DefaultFilesMetadataCache` to use new trait methods - Simplify list files cache implementation ### Callsite updates: - Update `ListingTable::do_collect_statistics` to use new cache API pattern - Update `DFParquetMetadata::fetch_metadata` to use new cache API - Update `ListingTableUrl` to use new cache API ## Are these changes tested? Yes, existing tests pass. The cache behavior is functionally equivalent, just with a cleaner API. ## Are there any user-facing changes? Breaking change to `FileStatisticsCache` and `FileMetadataCache` traits. Users with custom cache implementations will need to update them: - `get(key)` now takes `&Path` and returns `Option<CachedFileMetadata>` / `Option<CachedFileMetadataEntry>` - Callers must validate entries with `cached.is_valid_for(&object_meta)` - `put(key, value)` now takes `&Path` and the cached struct directly 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
