rdblue commented on a change in pull request #1801:
URL: https://github.com/apache/iceberg/pull/1801#discussion_r529034436
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java
##########
@@ -56,7 +65,9 @@ protected BaseProcedure(TableCatalog tableCatalog) {
T result = func.apply(icebergTable);
- refreshSparkCache(ident, sparkTable);
+ if (refreshSparkCache) {
+ refreshSparkCache(ident, sparkTable);
+ }
Review comment:
For this PR, I think it is better to invalidate the cache.
The more general question is one for Spark, but I think that the answer is
that Spark does not care about external changes. I think that Spark only cares
about consistency with its operations. If a table is loaded, then all queries
and cached dataframes should give consistent results. If Spark updates,
refreshes, or invalidates a table then it should invalidate caches. Otherwise,
Spark is simply using the current state as of when the table was loaded, which
is a good thing. Spark can't be expected to invalidate a cache every time
anything external changes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]