alamb commented on issue #15582:
URL: https://github.com/apache/datafusion/issues/15582#issuecomment-3128707387

   > |    100M   |  21.2953s  | 1.6018s | 13.2943x faster |
   
   13x faster 👏 
   
   <img width="894" height="894" alt="Image" 
src="https://github.com/user-attachments/assets/d1ed4083-e0ee-46fa-a23d-d90d9fa05d52";
 />
   
   I would love to help review / get this feature merged in
   
   > For the CacheManager to remain generic, I created a pub type FileMetadata 
= dyn Any + Send + Sync; to represent metadata, which essentially can end up 
storing anything. Unlike the other information stored (Statistics and 
ObjectMeta), there isn't a common type for embedded metadata. Is this the right 
approach, or should the CacheManager be aware of ParquetMetaData?
   
   I think keeping it generic makes the most sense because as you say there 
isn't something clearly that maps to all file formats
   
   > Unlike the other CacheManager parameters, which I believe are exclusively 
user-provided, I think it would make sense for the metadata cache be populated 
with a DefaultFilesMetadataCache, so its easier to enable caching just with 
ParquetReadOptions or set .... Does this make sense?
   
   It sounds so but I don't fully understand the proposal so I may me missing 
someting
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to