RussellSpitzer commented on a change in pull request #3861:
URL: https://github.com/apache/iceberg/pull/3861#discussion_r780660136



##########
File path: 
spark/v3.0/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
##########
@@ -123,11 +123,10 @@ public Table loadTable(Identifier ident) throws 
NoSuchTableException {
 
   @Override
   public void invalidateTable(Identifier ident) {
-    if (icebergCatalog.tableExists(ident)) {
-      icebergCatalog.invalidateTable(ident);
-    } else {
-      getSessionCatalog().invalidateTable(ident);
-    }
+    // We do not need to check whether the table exists and whether
+    // it is an Iceberg table to reduce remote service requests.
+    icebergCatalog.invalidateTable(ident);

Review comment:
       So we always have to do at least one catalog load here. Since we have a 
caching catalog this is at most a single remote lookup but in most 
circumstances should probably be zero because the reference is cached.
   
   The code for the SparkCatalog 'invalidatetable' method starts by doing a 
'load' operation, so either that call will use a cached reference or do a 
remote lookup. If the 'exists' call was remote than it the table reference 
would be cached and the invalidate call would use that cache. If the exists 
call wasn't remote it also would be cached that means the reference was cached 
already and invalidate should also use that cache. So the number of remote 
lookups should be at most one both with this patch and without.
   
   This change now also always calls the underlying session catalogs invalidate 
method and I'm not sure what the consequences of that so I would probably 
recommend leaving this as is unless we have some evidence this is causing an 
issue.
   
   I think if we want to remove the remote calls we have to first go to the 
invalidateTables method and do some mods there.
   
   First, check whether the catalog is caching or not. For non caching or no 
cache timeout catalogs we can return a noop. For tables backed by a catalog 
with a cache we would probably need to add a dropCache function to the Iceberg 
cachingcatalog api or something like that so we don't have to rely on calling 
load.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to