Ted-Jiang commented on issue #9404:
URL: 
https://github.com/apache/arrow-datafusion/issues/9404#issuecomment-1987530880

   @alamb Sorry for the late response.
   >@Ted-Jiang is there some way to test via configuration setting if caching 
the per-file metadata would help these queries?
   
   There are two kinds of cache in datafusion `ListFilesCache` and 
`FileStatisticsCache`
   
   1.  `ListFilesCache`: Before I change this into `session` level, i think it 
is already `on` in global level (so no config setting in df 🤣 ). it will reuse 
the result under the same path of list files.
   2.  `FileStatisticsCache`: this cache is use for `CBO` usage, it keeps off 
all the time. I add this cache in our internal system to cache FileStatistics 
for join selection usage.
   
   > If it turns out to make a difference, maybe we could provide a simple LRU 
type cache by default in DataFusion 
   This sounds like great,  like `linked_hash_map`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to