westonpace commented on PR #13800: URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2548948638
> What's cache-then-plan approach? (The linked page doesn't include "cache"). How did we solve cold cache problem? @findepi Perhaps I should avoid using the word cache. This is not a long lived multi-query cache. This is a single query cache meant to be thrown away after the query has completed. It is a very short-lived cache that is designed to avoid repeated lookups during multiple planning passes. Every query is still a "cold" query. It would be possible to create another longer-lived caching layer on top of this but I am not trying to solve that problem at the moment. > I'm thinking if we use the cached tables should we have a tests for that? I mean that cached tables should reflect the most recent catalog state, if the table added/modified/dropped it should be reflected in the caches @comphead There is no concern for cache eviction / staleness here because this cache should not be kept longer than a single query. There is some possibility for a catalog change to happen in between reference lookups (`resolve`) and query execution. However, this will always be possible when using a remote catalog. The query execution should return an error from the remote endpoint saying "no database/schema/table found" or "query does not match schema". I'm not sure we can avoid this without some kind of synchronization mechanism with a remote catalog and I don't think there has been much work in that regard (but I admittedly haven't examined the APIs in great depth). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
