liukun4515 commented on issue #7556:
URL: 
https://github.com/apache/arrow-datafusion/issues/7556#issuecomment-1720412409

   > I think keeping a metadata cache on the RuntimeEnv is reasonable as long as
   > 
   > 1. There is a way to extend / disable the default behavior (as there is 
with the DiskManager and MemoryPool).
   > 2. The default implementation in DataFusion is simple
   > 
   > The rationale for something simple built in but a configurable API is that 
the exact caching strategy is likely to vary tremendously from system to system 
(for example, if there is a local file based parquet cache, storing metadata in 
memory might not make sense, or how to do cache eviction or enforce limits, 
etc).
   > 
   
   This suggestion is very important
   
   > Therefore it is unlikely that anything in DataFusion will cover all 
usecases, so what is built in should be simple and allow users to add whatever 
specific caching policy they want
   > 
   > Does that makes sense @Ted-Jiang ?
   
   @alamb 
   Does influx io has the `file statis cache` or the `list files cache` when?
   How does influx io resolve the issue that node need to visit the remote 
storage when generating the execution plan?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to