alamb commented on issue #18470: URL: https://github.com/apache/datafusion/issues/18470#issuecomment-3863046575
Here is another (self serving) paper that discusses the use of caches: [The Five-Minute Rule for the Cloud: Caching in Analytics Systems](https://vldb.org/cidrdb/papers/2025/p4-duwe.pdf) > If we have the prefetch, we can reduce the dependency of the cache layer and save more resources. In general I agree this is true The challenge I worry about about is that if you aren't careful, prefetching results in buffering more than necessary (aka fetch faster than the CPU can consume it) thus using more memory than needed So the optimal prefetching strategy depends heavily on what your IO looks like -- for example the pattern is quite different from object store and "local disk" -- and even pretty different if local disk is an actual SSD or a cloud "EBS" like block store. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
