rdblue commented on code in PR #15448:
URL: https://github.com/apache/iceberg/pull/15448#discussion_r2897933899
##########
core/src/test/java/org/apache/iceberg/rest/TestRESTScanPlanning.java:
##########
@@ -953,4 +957,110 @@ public void serverSupportsPlanningButNotCancellation()
throws IOException {
// Verify no exception was thrown - cancelPlan returns false when endpoint
not supported
assertThat(cancelled).isFalse();
}
+
+ @ParameterizedTest
+ @EnumSource(PlanningMode.class)
+ void fileIOForRemotePlanningIsPropagated(
+ Function<TestPlanningBehavior.Builder, TestPlanningBehavior.Builder>
planMode) {
+ RESTCatalogAdapter adapter =
+ Mockito.spy(
+ new RESTCatalogAdapter(backendCatalog) {
+ @Override
+ public <T extends RESTResponse> T execute(
+ HTTPRequest request,
+ Class<T> responseType,
+ Consumer<ErrorResponse> errorHandler,
+ Consumer<Map<String, String>> responseHeaders,
+ ParserContext parserContext) {
+ T response =
+ super.execute(
+ request, responseType, errorHandler, responseHeaders,
parserContext);
+ return maybeAddStorageCredential(response);
+ }
+ });
+
+
adapter.setPlanningBehavior(planMode.apply(TestPlanningBehavior.builder()).build());
+
+ RESTCatalog catalog =
+ new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config)
-> adapter);
+ catalog.initialize(
+ "test",
+ ImmutableMap.of(
+ CatalogProperties.FILE_IO_IMPL,
+ "org.apache.iceberg.inmemory.InMemoryFileIO",
+ RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
+ "true"));
+
+ Table table = restTableFor(catalog, "file_io_propagation");
+ setParserContext(table);
+
+
assertThat(table.io().properties()).doesNotContainKey(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+
+ TableScan tableScan = table.newScan();
+ assertThat(tableScan.io().properties())
+ .isSameAs(table.io().properties())
+ .doesNotContainKey(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+ // make sure remote scan planning is called and FileIO gets the planId
+ assertThat(tableScan.planFiles()).hasSize(1);
+
assertThat(table.io().properties()).doesNotContainKey(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+
assertThat(tableScan.io().properties()).containsKey(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+ String planId =
tableScan.io().properties().get(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+
+ TableScan newScan = table.newScan();
+ assertThat(newScan.io().properties())
+ .isSameAs(table.io().properties())
+ .doesNotContainKey(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+ // make sure remote scan planning is called and FileIO gets the planId
+ assertThat(newScan.planFiles()).hasSize(1);
+
assertThat(table.io().properties()).doesNotContainKey(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+
+ // make sure planIds are different for each scan
+
assertThat(newScan.io().properties()).containsKey(RESTCatalogProperties.REST_SCAN_PLAN_ID);
+
assertThat(newScan.io().properties().get(RESTCatalogProperties.REST_SCAN_PLAN_ID))
+ .isNotEqualTo(planId);
+ }
+
+ @SuppressWarnings("unchecked")
+ private <T extends RESTResponse> T maybeAddStorageCredential(T response) {
+ if (response instanceof PlanTableScanResponse resp
+ && PlanStatus.COMPLETED == resp.planStatus()) {
Review Comment:
I'm looking into when we can return a `FileIO` from the scan and I was
surprised to see that `storage-credentials` is returned for any
`CompletedPlanningResult` rather than `CompletedPlanningWithIDResult`. The one
with ID is used for `completed` responses from the plan endpoint, while the
generic result is returned from both plan and fetch endpoints.
Why can we return storage credentials when fetching tasks? Wouldn't it make
more sense to return credentials once per planning operation? That simplifies
when we know we have credentials.
That wouldn't solve the problem of needing to wait until after `planFiles`
is called to get the `FileIO`, but it would at least simplify the protocol so
we don't need to try to create a new `FileIO` after each call to fetch more
tasks. (@danielcweeks, any thoughts on this?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]