alamb commented on issue #18470:
URL: https://github.com/apache/datafusion/issues/18470#issuecomment-3863046575

   Here is another (self serving) paper that discusses the use of caches: [The 
Five-Minute Rule for the Cloud: Caching in Analytics 
Systems](https://vldb.org/cidrdb/papers/2025/p4-duwe.pdf)
   
   > If we have the prefetch, we can reduce the dependency of the cache layer 
and save more resources.
   
   In general I agree this is true
   
   The challenge I worry about about is that if you aren't careful, prefetching 
results in buffering more than necessary (aka fetch faster than the CPU can 
consume it) thus using more memory than needed
   
   So the optimal prefetching strategy depends heavily on what your IO looks 
like -- for example the pattern is quite different from object store and "local 
disk" -- and even pretty different if local disk is an actual SSD or a cloud 
"EBS" like block store. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to