gaborkaszab commented on code in PR #14398:
URL: https://github.com/apache/iceberg/pull/14398#discussion_r2687127212


##########
core/src/main/java/org/apache/iceberg/rest/RESTCatalog.java:
##########
@@ -96,6 +97,11 @@ public void initialize(String name, Map<String, String> 
props) {
     sessionCatalog.initialize(name, props);
   }
 
+  @VisibleForTesting
+  RESTSessionCatalog sessionCatalog() {

Review Comment:
   It could work as protected too, but I didn't want to expose this to whoever 
is deriving from this class. I thought package private + `@VisibleForTesting` 
is the way to express that this is exposed only for testing.
   
   I changed this as you suggested.



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -424,8 +517,26 @@ public Table loadTable(SessionContext context, 
TableIdentifier identifier) {
     MetadataTableType metadataType;
     LoadTableResponse response;
     TableIdentifier loadedIdent;
+
+    Map<String, String> responseHeaders = Maps.newHashMap();
+    TableSupplierWithETag cachedTable =
+        tableCache.getIfPresent(SessionIdTableId.of(context.sessionId(), 
identifier));
+
     try {
-      response = loadInternal(context, identifier, snapshotMode);
+      response =
+          loadInternal(
+              context,
+              identifier,
+              snapshotMode,
+              headersForLoadTable(cachedTable),
+              responseHeaders::putAll);
+
+      if (response == null) {
+        Preconditions.checkNotNull(cachedTable, "Invalid  load table response: 
null");

Review Comment:
   Done



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String> 
unresolved) {
             mergedProps,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+    this.tableCache = tableCacheBuilder(mergedProps).build();
+
     super.initialize(name, mergedProps);
   }
 
+  @VisibleForTesting

Review Comment:
   Sure. It is used in `TestableRESTSessionCatalog` that is in the test/ 
folder, hence I thought that `@VisibleForTesting` makes sense.
   I changed this to protected now.



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -424,8 +517,26 @@ public Table loadTable(SessionContext context, 
TableIdentifier identifier) {
     MetadataTableType metadataType;
     LoadTableResponse response;
     TableIdentifier loadedIdent;
+
+    Map<String, String> responseHeaders = Maps.newHashMap();
+    TableSupplierWithETag cachedTable =
+        tableCache.getIfPresent(SessionIdTableId.of(context.sessionId(), 
identifier));

Review Comment:
   No need to have different ETags for different snapshot mode. This works now 
without that and also without adding the snapshot mode into the cache key. When 
the catalog is in REFS mode then the cache only contains the REFS mode of the 
table. When clients do the on-demand lazy snapshot loading, it doesn't go 
through the table cache, it is done via a `TableMetadata.snapshotSupplier` 
calling `loadInternal` directly avoiding the cache.
   
   This works, however each client getting the table from cache have to load 
all snapshots on-demand themselves. I deliberately decided to not implement 
that with this initial PR to keep the code as simple as possible, and to make 
reviewing easier.
   I prefer to implement freshness-aware loading for on-demand snapshot loading 
as a follow-up PR. LMK what you think!
   



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String> 
unresolved) {
             mergedProps,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+    this.tableCache = tableCacheBuilder(mergedProps).build();
+
     super.initialize(name, mergedProps);
   }
 
+  @VisibleForTesting
+  Caffeine<Object, Object> tableCacheBuilder(Map<String, String> props) {
+    long expireAfterWriteMS =
+        PropertyUtil.propertyAsLong(
+            props,
+            RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS,
+            RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS_DEFAULT);
+    Preconditions.checkArgument(

Review Comment:
   I use the max number of entries for this. It accepts zero and it means the 
cache is turned off. The documentation for `maxSize()` says:
   "When size is zero, elements will be evicted immediately after being loaded 
into the cache. This can be useful in testing, or to disable caching 
temporarily without a code change."
   
   While the doc for `expireAfterWrite()` doesn't explicitly say this even 
though it could be used for this purpose. It mentions the following though:
   "Expired entries are cleaned up as part of the routine maintenance"
   
   This gave me the impression that tweaking the expiration to zero eventually 
evicts the entries, while setting maxSize to zero does this immediately. Hence 
I went for the latter. Let me know if you think expiration interval param is 
more suitable than max size param for this.



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String> 
unresolved) {
             mergedProps,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+    this.tableCache = tableCacheBuilder(mergedProps).build();

Review Comment:
   I wanted to create a way where I can override the method in 
`TestableRESTSessionCatalog` to do everything what this method does plus adding 
a ticker. I can change this function to return not the builder but the built 
cache instead, but then the override has to contain all the code that the base 
version has, and whenever we change how the cache is built here, we probably 
have to change the same in `TestableRESTSessionCatalog` too.



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -435,14 +546,34 @@ public Table loadTable(SessionContext context, 
TableIdentifier identifier) {
         // attempt to load a metadata table using the identifier's namespace 
as the base table
         TableIdentifier baseIdent = 
TableIdentifier.of(identifier.namespace().levels());
         try {
-          response = loadInternal(context, baseIdent, snapshotMode);
+          responseHeaders.clear();
+          cachedTable =
+              tableCache.getIfPresent(SessionIdTableId.of(context.sessionId(), 
baseIdent));
+
+          response =
+              loadInternal(
+                  context,
+                  baseIdent,
+                  snapshotMode,
+                  headersForLoadTable(cachedTable),
+                  responseHeaders::putAll);
+
+          if (response == null) {
+            Preconditions.checkNotNull(cachedTable, "Invalid load table 
response: null");
+
+            return MetadataTableUtils.createMetadataTableInstance(
+                cachedTable.tableSupplier().get(), metadataType);
+          }
+
           loadedIdent = baseIdent;
         } catch (NoSuchTableException ignored) {
           // the base table does not exist
+          invalidateTable(context, baseIdent);
           throw original;
         }
       } else {
         // name is not a metadata table
+        invalidateTable(context, identifier);

Review Comment:
   Technically it's feasible that we get here and the table exists in the 
cache. See `TestRESTCatalog.testLoadTableInvalidatesCache`. Steps:
   1) We load the table. This populates the cache
   2) With another catalog instance (maybe through another engine for instance) 
we drop the table. The table will remain in the cache in this catalog.
   3) Another load table will fail because the table doesn't exist anymore, 
while the table still in the cache. I think it makes sense to invalidate the 
table in this case.



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String> 
unresolved) {
             mergedProps,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+    this.tableCache = tableCacheBuilder(mergedProps).build();
+
     super.initialize(name, mergedProps);
   }
 
+  @VisibleForTesting
+  Caffeine<Object, Object> tableCacheBuilder(Map<String, String> props) {
+    long expireAfterWriteMS =
+        PropertyUtil.propertyAsLong(
+            props,
+            RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS,
+            RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS_DEFAULT);
+    Preconditions.checkArgument(
+        expireAfterWriteMS > 0, "Invalid expire after write: zero or 
negative");
+
+    long numEntries =
+        PropertyUtil.propertyAsLong(
+            props,
+            RESTCatalogProperties.TABLE_CACHE_MAX_ENTRIES,
+            RESTCatalogProperties.TABLE_CACHE_MAX_ENTRIES_DEFAULT);
+    Preconditions.checkArgument(numEntries >= 0, "Invalid max entries: 
negative");
+
+    Caffeine<Object, Object> builder =
+        Caffeine.newBuilder()
+            .maximumSize(numEntries)
+            .expireAfterWrite(Duration.ofMillis(expireAfterWriteMS))
+            .removalListener(
+                (compositeKey, table, cause) ->
+                    LOG.debug("Evicted {} from table cache ({})", 
compositeKey, cause))
+            .recordStats();
+
+    return builder;
+  }
+
+  @VisibleForTesting

Review Comment:
   Sure, done.
   I'm not sure I have to change `SessionIdTableId` and `TableSupplierWithETag` 
also to protected because now they are private but exposed via a protected 
method. Builds, but IDE has some complaints.



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String> 
unresolved) {
             mergedProps,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
             RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+    this.tableCache = tableCacheBuilder(mergedProps).build();
+
     super.initialize(name, mergedProps);
   }
 
+  @VisibleForTesting
+  Caffeine<Object, Object> tableCacheBuilder(Map<String, String> props) {

Review Comment:
   Yes, this returns a builder and not the built cache. The naming comes from 
Caffeine and I also find it misleading :)
   I explain above why I return a builder instead of the built cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to