findepi commented on PR #13800: URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2550580487
> Perhaps I should avoid using the word cache. This is not a long lived multi-query cache. This is a single query cache meant to be thrown away after the query has completed @westonpace thanks for explaining. I think the use of cache is justified in this context and easier to understand than eg 'working set'. I agree this is important to have a notion of query-level information for two reasons. Performance is the obvious one: we should not repeatedly compute info we already knew. Second is correctness (consistency). If a query eg self-joins an Iceberg table T, the table may need to be read twice, but the reads should come from the same snapshot of T. So we agree on the need for this. The question is who's responsible for providing this consistency. Is this a catalog or table provider (eg it should self-wrap in ResolvedCatalogProvider), or is it the engine itself (then the question is how exactly this is impl'd) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
