gaborkaszab commented on code in PR #14166: URL: https://github.com/apache/iceberg/pull/14166#discussion_r2372349980
########## core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java: ########## @@ -68,7 +70,36 @@ enum UpdateType { FileIO io, TableMetadata current, Set<Endpoint> endpoints) { - this(client, path, headers, io, UpdateType.SIMPLE, Lists.newArrayList(), current, endpoints); + this( + client, + path, + headers, + io, + UpdateType.SIMPLE, + Lists.newArrayList(), + current, + endpoints, + ImmutableMap.of()); + } + + RESTTableOperations( Review Comment: See my other comment, but I feel we could extend the existing constructors with snapshotMode query param and not introduce more variants of the constructors. ########## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ########## @@ -887,6 +887,103 @@ public void testTableCredential(String oauth2ServerUri) { oauth2ServerUri); } + @Test + public void testTableRefreshLoadsRefsOnly() { + RESTCatalogAdapter adapter = Mockito.spy(new RESTCatalogAdapter(backendCatalog)); + + RESTCatalog catalog = + new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) -> adapter); + catalog.initialize( + "test", + ImmutableMap.of( + CatalogProperties.URI, + "ignored", + CatalogProperties.FILE_IO_IMPL, + "org.apache.iceberg.inmemory.InMemoryFileIO", + // default loading to refs only + RESTCatalogProperties.SNAPSHOT_LOADING_MODE, + SnapshotMode.REFS.name())); + + if (requiresNamespaceCreate()) { + catalog.createNamespace(TABLE.namespace()); + } + + // Create a table with multiple snapshots + Table table = catalog.createTable(TABLE, SCHEMA); + table + .newFastAppend() + .appendFile( + DataFiles.builder(PartitionSpec.unpartitioned()) + .withPath("/path/to/data-a.parquet") + .withFileSizeInBytes(10) + .withRecordCount(2) + .build()) + .commit(); + + table + .newFastAppend() + .appendFile( + DataFiles.builder(PartitionSpec.unpartitioned()) + .withPath("/path/to/data-b.parquet") + .withFileSizeInBytes(10) + .withRecordCount(2) + .build()) + .commit(); + + ResourcePaths paths = ResourcePaths.forCatalogProperties(Maps.newHashMap()); + + // Respond with only referenced snapshots + Answer<?> refsAnswer = Review Comment: with [this PR](https://github.com/apache/iceberg/pull/14060) this part of the test won't be needed to simulate refs only answer. ########## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ########## @@ -456,7 +456,8 @@ public Table loadTable(SessionContext context, TableIdentifier identifier) { Map::of, tableFileIO(context, tableConf, response.credentials()), tableMetadata, - endpoints); + endpoints, + snapshotModeToParam(snapshotMode)); Review Comment: I see other occasions where RESTTableOperations is constructed in RESTSessionCatalog, like create(), createTransaction(), and others. `snapshotMode` is already present there too. Do you think it makes sense to use the new constructor there too? Also, if there are no other calls, then the old constructor of RESTTableOperations could be removed. ########## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ########## @@ -887,6 +887,103 @@ public void testTableCredential(String oauth2ServerUri) { oauth2ServerUri); } + @Test + public void testTableRefreshLoadsRefsOnly() { + RESTCatalogAdapter adapter = Mockito.spy(new RESTCatalogAdapter(backendCatalog)); + + RESTCatalog catalog = + new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) -> adapter); + catalog.initialize( + "test", + ImmutableMap.of( + CatalogProperties.URI, + "ignored", + CatalogProperties.FILE_IO_IMPL, + "org.apache.iceberg.inmemory.InMemoryFileIO", + // default loading to refs only + RESTCatalogProperties.SNAPSHOT_LOADING_MODE, + SnapshotMode.REFS.name())); + + if (requiresNamespaceCreate()) { + catalog.createNamespace(TABLE.namespace()); + } + + // Create a table with multiple snapshots + Table table = catalog.createTable(TABLE, SCHEMA); + table + .newFastAppend() + .appendFile( + DataFiles.builder(PartitionSpec.unpartitioned()) + .withPath("/path/to/data-a.parquet") + .withFileSizeInBytes(10) + .withRecordCount(2) + .build()) + .commit(); + + table + .newFastAppend() + .appendFile( + DataFiles.builder(PartitionSpec.unpartitioned()) + .withPath("/path/to/data-b.parquet") + .withFileSizeInBytes(10) + .withRecordCount(2) + .build()) + .commit(); + + ResourcePaths paths = ResourcePaths.forCatalogProperties(Maps.newHashMap()); + + // Respond with only referenced snapshots + Answer<?> refsAnswer = + invocation -> { + LoadTableResponse originalResponse = (LoadTableResponse) invocation.callRealMethod(); + TableMetadata refsMetadata = + TableMetadata.buildFrom(originalResponse.tableMetadata()) + .suppressHistoricalSnapshots() + .build(); + + // don't call snapshots() directly as that would cause to load all snapshots. Instead, + // make sure the snapshots field holds exactly 1 snapshot + assertThat(refsMetadata) + .extracting("snapshots") + .asInstanceOf(InstanceOfAssertFactories.list(Snapshot.class)) + .hasSize(1); + + return LoadTableResponse.builder() + .withTableMetadata(refsMetadata) + .addAllConfig(originalResponse.config()) + .build(); + }; + + Mockito.doAnswer(refsAnswer) + .when(adapter) + .execute( + reqMatcher(HTTPMethod.GET, paths.table(TABLE), Map.of(), Map.of("snapshots", "refs")), + eq(LoadTableResponse.class), + any(), + any()); + + Table refsTable = catalog.loadTable(TABLE); + refsTable.refresh(); Review Comment: I think the current implementation of lazy snapshot loading is inconsistent, probably not within the scope of this PR, but for the record: ``` Table table = catalog.createTable(TABLE, SCHEMA); table.refresh(); ``` The above code loads the table twice as this test, however both of them loads the full snapshot list not respecting the `REFS` snapshot loading mode. This PR might want to solve this for the second load with passing the `snapshotMode` to `RESTTableOperations` all across `RESTSessionCatalog`. Adjusting the first load (for createTable() and all) would require a REST spec change. See my question about this [on Slack](https://apache-iceberg.slack.com/archives/C03LG1D563F/p1758629299891649). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org