alamb commented on issue #15582:
URL: https://github.com/apache/datafusion/issues/15582#issuecomment-3005802982

   > As a DataFusion user, I’m wondering if the Parquet footer could be cached. 
[This blog mentioned footer 
caching](https://blog.xiangpeng.systems/posts/caching-datafusion/#parquet-metadata-cache).
 It is a bit confusing to me.
   > 
   > (I’m also involved in promoting metadata caching in cuDF, which is why I 
mentioned this issue there as well.)
   
   DataFusion (and `datafusion-cli`) doesn't do footer caching by default -- 
you can see this here:
   - https://github.com/apache/datafusion/issues/15582
   - https://github.com/apache/datafusion/issues/16365
   
   
   You can do it and most high performance integrations (like InfluxDB and 
pydantic) absolutely do. An example of doing it is here:
   
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/advanced_parquet_index.rs
   
   I am thinking adding the footer caching thing to datafusion-cli might be a 
nice way to improve performance and make it easier to see how to add these 
features


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to