[GitHub] [iceberg] rdblue commented on a change in pull request #1801: Spark: Add RewriteManifestsProcedure

GitBox Mon, 23 Nov 2020 14:23:16 -0800


rdblue commented on a change in pull request #1801:
URL: https://github.com/apache/iceberg/pull/1801#discussion_r529034436




##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java
##########
@@ -56,7 +65,9 @@ protected BaseProcedure(TableCatalog tableCatalog) {
 
     T result = func.apply(icebergTable);
 
-    refreshSparkCache(ident, sparkTable);
+    if (refreshSparkCache) {
+      refreshSparkCache(ident, sparkTable);
+    }

Review comment:
       For this PR, I think it is better to invalidate the cache.
   
   The more general question is one for Spark, but I think that the answer is 
that Spark does not care about external changes. I think that Spark only cares 
about consistency with its operations. If a table is loaded, then all queries 
and cached dataframes should give consistent results. If Spark updates, 
refreshes, or invalidates a table then it should invalidate caches. Otherwise, 
Spark is simply using the current state as of when the table was loaded, which 
is a good thing. Spark can't be expected to invalidate a cache every time 
anything external changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1801: Spark: Add RewriteManifestsProcedure

Reply via email to