RussellSpitzer commented on a change in pull request #3861:
URL: https://github.com/apache/iceberg/pull/3861#discussion_r780660136
##########
File path:
spark/v3.0/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
##########
@@ -123,11 +123,10 @@ public Table loadTable(Identifier ident) throws
NoSuchTableException {
@Override
public void invalidateTable(Identifier ident) {
- if (icebergCatalog.tableExists(ident)) {
- icebergCatalog.invalidateTable(ident);
- } else {
- getSessionCatalog().invalidateTable(ident);
- }
+ // We do not need to check whether the table exists and whether
+ // it is an Iceberg table to reduce remote service requests.
+ icebergCatalog.invalidateTable(ident);
Review comment:
So we always have to do at least one catalog load here. Since we have a
caching catalog this is at most a single remote lookup but in most
circumstances should probably be zero because the reference is cached.
The code for the SparkCatalog 'invalidatetable' method starts by doing a
'load' operation, so either that call will use a cached reference or do a
remote lookup. If the 'exists' call was remote than it the table reference
would be cached and the invalidate call would use that cache. If the exists
call wasn't remote it also would be cached that means the reference was cached
already and invalidate should also use that cache. So the number of remote
lookups should be at most one both with this patch and without.
This change now also always calls the underlying session catalogs invalidate
method and I'm not sure what the consequences of that so I would probably
recommend leaving this as is unless we have some evidence this is causing an
issue.
I think if we want to remove the remote calls we have to first go to the
invalidateTables method and do some mods there.
First, check whether the catalog is caching or not. For non caching or no
cache timeout catalogs we can return a noop. For tables backed by a catalog
with a cache we would probably need to add a dropCache function to the Iceberg
cachingcatalog api or something like that so we don't have to rely on calling
load.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]