alamb opened a new issue, #19056:
URL: https://github.com/apache/datafusion/issues/19056

   ### Is your feature request related to a problem or challenge?
   
   - part of https://github.com/apache/datafusion/issues/17214
   - follow on to https://github.com/apache/datafusion/pull/18855
   
   @BlakeOrth added  a cache to avoid re-listing all files which is great, and 
it includes a max size and a  TTL (time to live) for the entries.
   
   The default ttl is infinite (to ensure stability). However this is likely 
not what all users want so it would be useful to be able to change these 
parameters similarly to how the other runtime options can be configured
   
   
   
   
   
   
   ### Describe the solution you'd like
   
   
   I think what we should do (as a follow on PR) is to add runtime 
configuration settings for the max cache size and its ttl in  
https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings
   
   
   This would mean supporting
   
   ```sql
   -- set list files cache limit to 5MB 
   SET datafusion.runtime.list_files_cache_limit = '5M'
   -- set time to live for each entry to 1 minute 30 seconds
   SET datafusion.runtime.list_files_cache_limit = '1m30s'; -- would it be 
better like `1:30`?
   ```
   
   
   
   ### Describe alternatives you've considered
   
   I suggest adding two new runtime configuration options, following the model 
of `metadata_cache_limit`
   1. `list_files_cache_limit` -- size of cache
   2. `list_files_cache_ttl` -- ttl duration of entries
   
   that would mean roughly adding support here (and elsewhere in that file)
   
   
https://github.com/apache/datafusion/blob/838e1dea832e3cd8585498ba12216e1ad9f584a4/datafusion/core/src/execution/context/mod.rs#L1160-L1163
   
   
   And add tests like
   
https://github.com/apache/datafusion/blob/c8d26ba012471e6aece9430642d6a8a923bc344c/datafusion/sqllogictest/test_files/set_variable.slt#L314-L316
   
   And then add a note to the upgrade guide
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to