[GitHub] [iceberg] pvary commented on issue #2319: Caching Tables in SparkCatalog via CachingCatalog by default leads to stale data

GitBox Thu, 11 Mar 2021 22:51:21 -0800


pvary commented on issue #2319:
URL: https://github.com/apache/iceberg/issues/2319#issuecomment-797277486



   > Also, it's inconsistent with how Hive and Presto handle Iceberg tables; 
but also how Spark handles queries to non-Iceberg tables.
   
   Hive also should use the same snapshot of the table on query level, but the 
refresh is expected between sessions and transactions (currently queries). 
Since Hive query execution spans multiple JVMs, we have to find our own way for 
snapshotting tables. We have already started working on this (See BaseTable 
serialization)
   
   > I agree, this would solve caching for saving resources. However, this does 
not address the self-join concerns mentioned before, since they rely on looking 
at the same snapshot.
   
   I think the current CachingCatalog is too specific for general use but still 
has its own use-cases. Also, as this is a released feature some users might 
depend on its specific features. I would suggest to create a new one alongside 
it and when it is ready we might decide to deprecate the old.
   
   Whatdo you think? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on issue #2319: Caching Tables in SparkCatalog via CachingCatalog by default leads to stale data

Reply via email to