nuno-faria commented on code in PR #22613: URL: https://github.com/apache/datafusion/pull/22613#discussion_r3355148068
########## docs/source/library-user-guide/upgrading/55.0.0.md: ########## @@ -97,6 +97,22 @@ as a supertrait: + pub trait QueryPlanner: Any + Debug ``` +[22613]: https://github.com/apache/datafusion/pull/22613 + +### Unify LRU memory-limiting caches into one generic cache + +The caches `DefaultFileMetadataCache`, `DefaultListFilesCache` and `DefaultFileStatisticsCache` +are merged into one generic implementation `DefaultCache`. Also the traits `CacheAccessor`, +`FileMetadataCache`, `ListFilesCache` and `FileStatisticsCache` are merged into the `Cache` trait. Review Comment: Since the most common thing to extend is probably `FileMetadataCache`/`ListFilesCache`/`FileStatisticsCache`, it might be better to show here what the new types look like: ```diff - pub trait FileStatisticsCache: CacheAccessor<TableScopedPath, CachedFileMetadata> - pub trait ListFilesCache: CacheAccessor<TableScopedPath, CachedFileList> - pub trait FileMetadataCache: CacheAccessor<Path, CachedFileMetadataEntry> + pub type FileStatisticsCache = dyn Cache<TableScopedPath, CachedFileMetadata>; + pub type ListFilesCache = dyn Cache<TableScopedPath, CachedFileList>; + pub type FileMetadataCache = dyn Cache<Path, CachedFileMetadataEntry>; ``` ########## datafusion/execution/src/cache/mod.rs: ########## @@ -46,7 +50,7 @@ pub use list_files_cache::TableScopedPath; /// 1. Call `get(key)` to check for cached value /// 2. If `Some(cached)`, validate with `cached.is_valid_for(¤t_meta)` /// 3. If invalid or missing, compute new value and call `put(key, new_value)` -pub trait CacheAccessor<K, V>: Send + Sync { +pub trait Cache<K: CacheKey, V: CacheValue>: Send + Sync { Review Comment: This doc needs to be updated. ########## datafusion/execution/src/cache/cache_manager.rs: ########## @@ -15,31 +15,31 @@ // specific language governing permissions and limitations // under the License. -use crate::cache::CacheAccessor; -use crate::cache::DefaultListFilesCache; -use crate::cache::file_statistics_cache::{ - DEFAULT_FILE_STATISTICS_MEMORY_LIMIT, DefaultFileStatisticsCache, - DefaultFilesMetadataCache, -}; -use crate::cache::list_files_cache::ListFilesEntry; -use crate::cache::list_files_cache::TableScopedPath; -use datafusion_common::TableReference; +use crate::cache::default_cache::DefaultCache; +pub use crate::cache::{Cache, CacheValue, TableScopedPath}; +use datafusion_common::HashMap; use datafusion_common::heap_size::{DFHeapSize, DFHeapSizeCtx}; -use datafusion_common::stats::Precision; use datafusion_common::{Result, Statistics}; use datafusion_physical_expr_common::sort_expr::LexOrdering; use object_store::ObjectMeta; use object_store::path::Path; use std::any::Any; -use std::collections::HashMap; use std::fmt::{Debug, Formatter}; use std::ops::Deref; use std::sync::Arc; use std::time::Duration; -pub use super::list_files_cache::{ - DEFAULT_LIST_FILES_CACHE_MEMORY_LIMIT, DEFAULT_LIST_FILES_CACHE_TTL, -}; +pub const DEFAULT_LIST_FILES_CACHE_MEMORY_LIMIT: usize = 1024 * 1024; // 1MiB + +pub const DEFAULT_LIST_FILES_CACHE_TTL: Option<Duration> = None; // Infinite + +pub const DEFAULT_FILE_STATISTICS_MEMORY_LIMIT: usize = 20 * 1024 * 1024; // 20MiB + +pub const DEFAULT_METADATA_CACHE_LIMIT: usize = 50 * 1024 * 1024; // 50M + +pub type FileStatisticsCache = dyn Cache<TableScopedPath, CachedFileMetadata>; +pub type ListFilesCache = dyn Cache<TableScopedPath, CachedFileList>; +pub type FileMetadataCache = dyn Cache<Path, CachedFileMetadataEntry>; Review Comment: I missed this in the first review, but we should add a doc for each type to explain what they are caching. I think we can reuse part of the docs used in the original traits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
