Ted-Jiang commented on issue #9404: URL: https://github.com/apache/arrow-datafusion/issues/9404#issuecomment-1987530880
@alamb Sorry for the late response. >@Ted-Jiang is there some way to test via configuration setting if caching the per-file metadata would help these queries? There are two kinds of cache in datafusion `ListFilesCache` and `FileStatisticsCache` 1. `ListFilesCache`: Before I change this into `session` level, i think it is already `on` in global level (so no config setting in df 🤣 ). it will reuse the result under the same path of list files. 2. `FileStatisticsCache`: this cache is use for `CBO` usage, it keeps off all the time. I add this cache in our internal system to cache FileStatistics for join selection usage. > If it turns out to make a difference, maybe we could provide a simple LRU type cache by default in DataFusion This sounds like great, like `linked_hash_map` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
