alamb commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-3128707387
> | 100M | 21.2953s | 1.6018s | 13.2943x faster | 13x faster 👏 <img width="894" height="894" alt="Image" src="https://github.com/user-attachments/assets/d1ed4083-e0ee-46fa-a23d-d90d9fa05d52" /> I would love to help review / get this feature merged in > For the CacheManager to remain generic, I created a pub type FileMetadata = dyn Any + Send + Sync; to represent metadata, which essentially can end up storing anything. Unlike the other information stored (Statistics and ObjectMeta), there isn't a common type for embedded metadata. Is this the right approach, or should the CacheManager be aware of ParquetMetaData? I think keeping it generic makes the most sense because as you say there isn't something clearly that maps to all file formats > Unlike the other CacheManager parameters, which I believe are exclusively user-provided, I think it would make sense for the metadata cache be populated with a DefaultFilesMetadataCache, so its easier to enable caching just with ParquetReadOptions or set .... Does this make sense? It sounds so but I don't fully understand the proposal so I may me missing someting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org