martin-g commented on code in PR #19366: URL: https://github.com/apache/datafusion/pull/19366#discussion_r2627165217
########## docs/source/library-user-guide/upgrading.md: ########## @@ -45,15 +45,23 @@ directly on the `Field`. For example: In prior versions, `ListingTableProvider` would issue `LIST` commands to the underlying object store each time it needed to list files for a query. To improve performance, `ListingTableProvider` now caches the results of -`LIST` commands for the lifetime of the `ListingTableProvider` instance. +`LIST` commands for the lifetime of the `ListingTableProvider` instance or +until a cache entry expires. Note that by default the cache has no expiration time, so if files are added or removed from the underlying object store, the `ListingTableProvider` will not see those changes until the `ListingTableProvider` instance is dropped and recreated. -You will be able to configure the maximum cache size and cache expiration time via a configuration option: +You can configure the maximum cache size and cache entry expiration time via configuration options: -See <https://github.com/apache/datafusion/issues/19056> for more details. +`datafusion.runtime.list_files_cache_limit` +`datafusion.runtime.list_files_cache_ttl` Review Comment: ```suggestion * `datafusion.runtime.list_files_cache_ttl` ``` ########## docs/source/library-user-guide/upgrading.md: ########## @@ -45,15 +45,23 @@ directly on the `Field`. For example: In prior versions, `ListingTableProvider` would issue `LIST` commands to the underlying object store each time it needed to list files for a query. To improve performance, `ListingTableProvider` now caches the results of -`LIST` commands for the lifetime of the `ListingTableProvider` instance. +`LIST` commands for the lifetime of the `ListingTableProvider` instance or +until a cache entry expires. Note that by default the cache has no expiration time, so if files are added or removed from the underlying object store, the `ListingTableProvider` will not see those changes until the `ListingTableProvider` instance is dropped and recreated. -You will be able to configure the maximum cache size and cache expiration time via a configuration option: +You can configure the maximum cache size and cache entry expiration time via configuration options: -See <https://github.com/apache/datafusion/issues/19056> for more details. +`datafusion.runtime.list_files_cache_limit` Review Comment: ```suggestion * `datafusion.runtime.list_files_cache_limit` ``` ########## docs/source/library-user-guide/upgrading.md: ########## @@ -45,15 +45,23 @@ directly on the `Field`. For example: In prior versions, `ListingTableProvider` would issue `LIST` commands to the underlying object store each time it needed to list files for a query. To improve performance, `ListingTableProvider` now caches the results of -`LIST` commands for the lifetime of the `ListingTableProvider` instance. +`LIST` commands for the lifetime of the `ListingTableProvider` instance or +until a cache entry expires. Note that by default the cache has no expiration time, so if files are added or removed from the underlying object store, the `ListingTableProvider` will not see those changes until the `ListingTableProvider` instance is dropped and recreated. -You will be able to configure the maximum cache size and cache expiration time via a configuration option: +You can configure the maximum cache size and cache entry expiration time via configuration options: -See <https://github.com/apache/datafusion/issues/19056> for more details. +`datafusion.runtime.list_files_cache_limit` Review Comment: Those do not render well at the moment - https://github.com/BlakeOrth/datafusion/blob/73c667216ee2fcb85196694cc5a36633f9928c19/docs/source/library-user-guide/upgrading.md#listingtableprovider-now-caches-list-commands Also it would be good to mention the units of their values ########## datafusion/execution/src/cache/cache_manager.rs: ########## @@ -178,15 +178,21 @@ impl CacheManager { let file_statistic_cache = config.table_files_statistics_cache.as_ref().map(Arc::clone); - let list_files_cache = config - .list_files_cache - .as_ref() - .inspect(|c| { - // the cache memory limit or ttl might have changed, ensure they are updated - c.update_cache_limit(config.list_files_cache_limit); - c.update_cache_ttl(config.list_files_cache_ttl); - }) - .map(Arc::clone); + let list_files_cache = match &config.list_files_cache { + Some(lfc) if config.list_files_cache_limit > 0 => { + lfc.update_cache_limit(config.list_files_cache_limit); + lfc.update_cache_ttl(config.list_files_cache_ttl); + Some(Arc::clone(lfc)) + } + None if config.list_files_cache_limit > 0 => { + let lfc: Arc<dyn ListFilesCache> = Arc::new(DefaultListFilesCache::new( + config.list_files_cache_limit, + config.list_files_cache_ttl, + )); + Some(lfc) + } + _ => None, Review Comment: If the user has disabled caching with `list_files_cache_limit = "0K"` then `None` will be returned here, but in this case `get_list_files_cache_limit()` will return "1M" * https://github.com/BlakeOrth/datafusion/blob/73c667216ee2fcb85196694cc5a36633f9928c19/datafusion/execution/src/cache/cache_manager.rs#L229 * https://github.com/BlakeOrth/datafusion/blob/main/datafusion/execution/src/cache/list_files_cache.rs#L144 ########## docs/source/library-user-guide/upgrading.md: ########## @@ -45,15 +45,23 @@ directly on the `Field`. For example: In prior versions, `ListingTableProvider` would issue `LIST` commands to the underlying object store each time it needed to list files for a query. To improve performance, `ListingTableProvider` now caches the results of -`LIST` commands for the lifetime of the `ListingTableProvider` instance. +`LIST` commands for the lifetime of the `ListingTableProvider` instance or +until a cache entry expires. Note that by default the cache has no expiration time, so if files are added or removed from the underlying object store, the `ListingTableProvider` will not see those changes until the `ListingTableProvider` instance is dropped and recreated. -You will be able to configure the maximum cache size and cache expiration time via a configuration option: +You can configure the maximum cache size and cache entry expiration time via configuration options: -See <https://github.com/apache/datafusion/issues/19056> for more details. +`datafusion.runtime.list_files_cache_limit` +`datafusion.runtime.list_files_cache_ttl` + +Caching can be disable by setting the limit to 0: Review Comment: ```suggestion Caching can be disabled by setting the limit to 0: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
