[GitHub] [iceberg] rdblue commented on pull request #2659: Spark3: Disable catalog cache-enabled.

GitBox Tue, 15 Jun 2021 10:34:35 -0700


rdblue commented on pull request #2659:
URL: https://github.com/apache/iceberg/pull/2659#issuecomment-861698721



   > Is this the only place in iceberg that a table would be cached?
   
   The table will be referenced by Spark plans as well. I think the problem was 
that those plans weren't being invalidated when you ran `REFRESH TABLE t` 
because the catalog's `invalidateTable` method calls `refresh` on the table 
reference that it loads. So if a table was cleared from the cache then the 
existing references in Spark would no longer be updated by the catalog's 
`invalidateTable` call.
   
   That seems like a Spark problem and not a catalog problem to me, which is 
why I think we should revisit this decision. Shouldn't Spark invalidate cached 
plans that reference a table when `REFRESH TABLE` runs, rather than assuming 
that the catalog can do it?
   
   We may also want to purposely separate a table when it is in a cached plan. 
@aokolnychyi, what did we decide was the "correct" behavior when a query is 
cached?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #2659: Spark3: Disable catalog cache-enabled.

Reply via email to