alchemist51 opened a new issue, #18362:
URL: https://github.com/apache/datafusion/issues/18362

   Currently in datafusion, the CacheManagerConfig has the metadata cache as an 
option
   
   ```
       /// Cache of file-embedded metadata, used to avoid reading it multiple 
times when processing a
       /// data file (e.g., Parquet footer and page metadata).
       /// If not provided, the [`CacheManager`] will create a 
[`DefaultFilesMetadataCache`].
       pub file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,
   ```
   
   Though as per the doc written above even if we set the file_metadata_cache 
as `None`, we will still end up with an DefaultFilesMetadataCache being created 
on it's own.
   
   I was thinking we can make the API a bit confusing and would like it to 
atleast honor the cases when we set it to None just like other Config options.
   
   For the same I'm thinking we can initialise the metadata cache object in the 
CacheManagerConfig in case of default method but in any other case where the 
user is overriding it, we should honor what's been set instead of creating a 
cache here in cache_manager:
   
   ```        let file_metadata_cache = config
               .file_metadata_cache
               .as_ref()
               .map(Arc::clone)
               .unwrap_or_else(|| {
                   
Arc::new(DefaultFilesMetadataCache::new(config.metadata_cache_limit))
               });
   ```
   
   Tagging @nuno-faria @alamb  who worked on the earlier PRs.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to