findepi commented on PR #13800:
URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2550580487

   > Perhaps I should avoid using the word cache. This is not a long lived 
multi-query cache. This is a single query cache meant to be thrown away after 
the query has completed
   
   @westonpace 
   thanks for explaining. I think the use of cache is justified in this context 
and easier to understand than eg 'working set'. I agree this is important to 
have a notion of query-level information for two reasons. Performance is the 
obvious one: we should not repeatedly compute info we already knew. Second is 
correctness (consistency). If a query eg self-joins an Iceberg table T, the 
table may need to be read twice, but the reads should come from the same 
snapshot of T. 
   
   So we agree on the need for this. 
   The question is who's responsible for providing this consistency. Is this a 
catalog or table provider (eg it should self-wrap in ResolvedCatalogProvider), 
or is it the engine itself (then the question is how exactly this is impl'd)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to