alamb opened a new issue, #17091: URL: https://github.com/apache/datafusion/issues/17091
### Is your feature request related to a problem or challenge? We are adding a parquet metadata cache to ListingTable 🎉 (thanks @nuno-faria @jonathanc-n and @shehabgamin ) It turns out it is somewhat tricky to get right, and it is not always clear what is going on. Especially tricky is when the metadata is cached with page indexes, and sometimes without it, for example see this PR: - https://github.com/apache/datafusion/pull/17022 ### Describe the solution you'd like I would like some way to see the contents of the cache with basic statistics ### Describe alternatives you've considered I suggest a twofold approach: 1. Add APIs to the `DefaultFileMetadataCache` itself 2. Add a function in `datafusion-cli` that uses those APIs to show the cache state This two pronged approach would 1. Help debug the working of the cache with datafusion-cli 2. Ensure the APIs on the cache can be used to build useful introspection tools 3. Offer an example of how to build such a thing for others An example might look like ```sql select * from ``` And the output might look ike | path | e_tag | size_bytes | page_index | hits | |--------|--------|--------|--------|--------| | /foo/bar | | 1234 | t | 12 | | /foo/baz | xdef| 3781 | t | 1| ... I think we could model its implementation on the `parquet_metadata` function: https://datafusion.apache.org/user-guide/cli/usage.html#parquet-metadata https://github.com/apache/datafusion/blob/173989cc2fb55c30cd174b520754812ea408e00b/datafusion-cli/src/functions.rs#L320 ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org