mkleen opened a new pull request, #20047:
URL: https://github.com/apache/datafusion/pull/20047

   ## Which issue does this PR close?
   
   This change introduces a default FileStatisticsCache implementation for the 
ListingTable with a size limit implementing the following steps following 
https://github.com/apache/datafusion/issues/19052#issuecomment-3603796097:
   
   - Add heap size estimation for file statistics and the relevant data types 
used in caching (This is temporary until 
https://github.com/apache/datafusion/pull/19599 and 
https://github.com/apache/arrow-rs/pull/9138 are resolved)
   
   - Redesign DefaultFileStatisticsCache to use an LruQueue, following 
https://github.com/apache/datafusion/pull/18855
   
   - Introduce a size limit on DefaultFileStatisticsCache
   
   This update also moves FileStatisticsCache creation into CacheManager, 
making it session-scoped and shared across statements and listing tables.
   
   Closes https://github.com/apache/datafusion/issues/19217, 
https://github.com/apache/datafusion/issues/19052
   
   ## Rationale for this change
   
   See above.
   
   ## What changes are included in this PR?
   
   See above.
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   A new runtime setting `datafusion.runtime.file_statistics.cache_limit`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to