Re: [PR] Rest: Allow refresh() to load refs only metadata [iceberg]

via GitHub Tue, 23 Sep 2025 07:01:06 -0700


gaborkaszab commented on code in PR #14166:
URL: https://github.com/apache/iceberg/pull/14166#discussion_r2372349980



##########
core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java:
##########
@@ -68,7 +70,36 @@ enum UpdateType {
       FileIO io,
       TableMetadata current,
       Set<Endpoint> endpoints) {
-    this(client, path, headers, io, UpdateType.SIMPLE, Lists.newArrayList(), 
current, endpoints);
+    this(
+        client,
+        path,
+        headers,
+        io,
+        UpdateType.SIMPLE,
+        Lists.newArrayList(),
+        current,
+        endpoints,
+        ImmutableMap.of());
+  }
+
+  RESTTableOperations(

Review Comment:
   See my other comment, but I feel we could extend the existing constructors 
with snapshotMode query param and not introduce more variants of the 
constructors.



##########
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##########
@@ -887,6 +887,103 @@ public void testTableCredential(String oauth2ServerUri) {
         oauth2ServerUri);
   }
 
+  @Test
+  public void testTableRefreshLoadsRefsOnly() {
+    RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+
+    RESTCatalog catalog =
+        new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+    catalog.initialize(
+        "test",
+        ImmutableMap.of(
+            CatalogProperties.URI,
+            "ignored",
+            CatalogProperties.FILE_IO_IMPL,
+            "org.apache.iceberg.inmemory.InMemoryFileIO",
+            // default loading to refs only
+            RESTCatalogProperties.SNAPSHOT_LOADING_MODE,
+            SnapshotMode.REFS.name()));
+
+    if (requiresNamespaceCreate()) {
+      catalog.createNamespace(TABLE.namespace());
+    }
+
+    // Create a table with multiple snapshots
+    Table table = catalog.createTable(TABLE, SCHEMA);
+    table
+        .newFastAppend()
+        .appendFile(
+            DataFiles.builder(PartitionSpec.unpartitioned())
+                .withPath("/path/to/data-a.parquet")
+                .withFileSizeInBytes(10)
+                .withRecordCount(2)
+                .build())
+        .commit();
+
+    table
+        .newFastAppend()
+        .appendFile(
+            DataFiles.builder(PartitionSpec.unpartitioned())
+                .withPath("/path/to/data-b.parquet")
+                .withFileSizeInBytes(10)
+                .withRecordCount(2)
+                .build())
+        .commit();
+
+    ResourcePaths paths = 
ResourcePaths.forCatalogProperties(Maps.newHashMap());
+
+    // Respond with only referenced snapshots
+    Answer<?> refsAnswer =

Review Comment:
   with [this PR](https://github.com/apache/iceberg/pull/14060) this part of 
the test won't be needed to simulate refs only answer.



##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -456,7 +456,8 @@ public Table loadTable(SessionContext context, 
TableIdentifier identifier) {
             Map::of,
             tableFileIO(context, tableConf, response.credentials()),
             tableMetadata,
-            endpoints);
+            endpoints,
+            snapshotModeToParam(snapshotMode));

Review Comment:
   I see other occasions where RESTTableOperations is constructed in 
RESTSessionCatalog, like create(), createTransaction(), and others. 
`snapshotMode` is already present there too. Do you think it makes sense to use 
the new constructor there too?
   Also, if there are no other calls, then the old constructor of 
RESTTableOperations could be removed.



##########
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##########
@@ -887,6 +887,103 @@ public void testTableCredential(String oauth2ServerUri) {
         oauth2ServerUri);
   }
 
+  @Test
+  public void testTableRefreshLoadsRefsOnly() {
+    RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+
+    RESTCatalog catalog =
+        new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+    catalog.initialize(
+        "test",
+        ImmutableMap.of(
+            CatalogProperties.URI,
+            "ignored",
+            CatalogProperties.FILE_IO_IMPL,
+            "org.apache.iceberg.inmemory.InMemoryFileIO",
+            // default loading to refs only
+            RESTCatalogProperties.SNAPSHOT_LOADING_MODE,
+            SnapshotMode.REFS.name()));
+
+    if (requiresNamespaceCreate()) {
+      catalog.createNamespace(TABLE.namespace());
+    }
+
+    // Create a table with multiple snapshots
+    Table table = catalog.createTable(TABLE, SCHEMA);
+    table
+        .newFastAppend()
+        .appendFile(
+            DataFiles.builder(PartitionSpec.unpartitioned())
+                .withPath("/path/to/data-a.parquet")
+                .withFileSizeInBytes(10)
+                .withRecordCount(2)
+                .build())
+        .commit();
+
+    table
+        .newFastAppend()
+        .appendFile(
+            DataFiles.builder(PartitionSpec.unpartitioned())
+                .withPath("/path/to/data-b.parquet")
+                .withFileSizeInBytes(10)
+                .withRecordCount(2)
+                .build())
+        .commit();
+
+    ResourcePaths paths = 
ResourcePaths.forCatalogProperties(Maps.newHashMap());
+
+    // Respond with only referenced snapshots
+    Answer<?> refsAnswer =
+        invocation -> {
+          LoadTableResponse originalResponse = (LoadTableResponse) 
invocation.callRealMethod();
+          TableMetadata refsMetadata =
+              TableMetadata.buildFrom(originalResponse.tableMetadata())
+                  .suppressHistoricalSnapshots()
+                  .build();
+
+          // don't call snapshots() directly as that would cause to load all 
snapshots. Instead,
+          // make sure the snapshots field holds exactly 1 snapshot
+          assertThat(refsMetadata)
+              .extracting("snapshots")
+              .asInstanceOf(InstanceOfAssertFactories.list(Snapshot.class))
+              .hasSize(1);
+
+          return LoadTableResponse.builder()
+              .withTableMetadata(refsMetadata)
+              .addAllConfig(originalResponse.config())
+              .build();
+        };
+
+    Mockito.doAnswer(refsAnswer)
+        .when(adapter)
+        .execute(
+            reqMatcher(HTTPMethod.GET, paths.table(TABLE), Map.of(), 
Map.of("snapshots", "refs")),
+            eq(LoadTableResponse.class),
+            any(),
+            any());
+
+    Table refsTable = catalog.loadTable(TABLE);
+    refsTable.refresh();

Review Comment:
   I think the current implementation of lazy snapshot loading is inconsistent, 
probably not within the scope of this PR, but for the record:
   ```
   Table table = catalog.createTable(TABLE, SCHEMA);
   table.refresh();
   ```
   The above code loads the table twice as this test, however both of them 
loads the full snapshot list not respecting the `REFS` snapshot loading mode.
   This PR might want to solve this for the second load with passing the 
`snapshotMode` to `RESTTableOperations` all across `RESTSessionCatalog`.
   Adjusting the first load (for createTable() and all) would require a REST 
spec change. See my question about this [on 
Slack](https://apache-iceberg.slack.com/archives/C03LG1D563F/p1758629299891649).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Rest: Allow refresh() to load refs only metadata [iceberg]

Reply via email to