abhita commented on issue #18405:
URL: https://github.com/apache/datafusion/issues/18405#issuecomment-3519855274

   Thanks for the feedback, @alamb @nuno-faria 
   Regarding your suggestion to provide a custom cache manager - I’ve actually 
looked into this, and the issue is that with the current design, I would need 
to reimplement the entire cache accessor logic and all cache operations. This 
leads to significant code duplication.
   Following that approach, I implemented a custom `FileMetadataCache` with 
decoupled storage and eviction:
   ```rust
   pub struct MyCustomFileMetadataCache {
       inner_cache: DashMap<ObjectMeta, Arc<dyn FileMetadata>>,
       eviction_strategy: Arc<Mutex<Box<dyn MyEvictionStrategy>>>,
       memory_limit: usize,
   }
   pub trait MyEvictionStrategy: Send + Sync {
       fn on_access(&mut self, key: &ObjectMeta, size: usize);
       fn select_for_eviction(&self, target_size: usize) -> Vec<ObjectMeta>;
   }
   impl FileMetadataCache for MyCustomFileMetadataCache {
       fn get_with_extra(&self, k: &ObjectMeta, e: &ObjectMeta) -> 
Option<Arc<dyn FileMetadata>> {
           if let Some(entry) = self.inner_cache.get(k) {
               self.eviction_strategy.lock().unwrap().on_access(k, 
entry.memory_size());
               return Some(entry.clone());
           }
           None
       }
       // ... 9 more trait methods
   }
   // Configure it
   let config = CacheManagerConfig::default()
       
.with_file_metadata_cache(Some(Arc::new(MyCustomFileMetadataCache::new())));
   ```
   ## Why This Isn't a Clean Solution:
   __1. I'm Building Framework Infrastructure:__ I'm implementing the entire 
decoupling pattern (storage abstraction, eviction trait, memory management) 
that should exist in framework itself.
   __2. Maintenance Burden:__ Bug fixes and improvements to DataFusion's cache 
infrastructure don't benefit the custom implementation.
   __3. Not Scalable:__ Every user needing different eviction strategies must 
implement this same decoupling pattern independently.
   ## Again Coming back to the Proposed Solution:
   Moving the decoupling __into DataFusion__:
   ```rust
   // In DataFusion - written once, maintained once
   pub struct CustomFileMetadataCache {
       inner: DashMap<ObjectMeta, Arc<dyn FileMetadata>>,
       strategy: Arc<Mutex<Box<dyn EvictionStrategy>>>,
   }
   ```
    Users just configure the given eviction strategy. With this, the 
maintenance of Cache storage and retrieval lies within Datafusion thereby 
eliminating room for errors in case of : Cache Access, Memory Tracking, Metric 
Evaluation,etc
   ```rust
   let cache = CustomFileMetadataCache::new().with_strategy(LfuStrategy::new());
   ```
   __How this solves the issue:__
   - For different strategies I can configure the eviction policy without 
reimplementing cache logic
   - The cache storage code is written once and tested once
   - Bug fixes apply to all strategies automatically
   - Users wanting basic LRU continue using `DefaultFilesMetadataCache` 
unchanged
   - Allowing users to swap strategies without re-initializing the entire cache


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to