morningman commented on PR #60498: URL: https://github.com/apache/doris/pull/60498#issuecomment-4392537483
# Improvement Suggestions for PR #60498 **PR**: [\[feat\](Iceberg) Rest & S3Table Support Iam-role](https://github.com/apache/doris/pull/60498) **Review Date**: 2026-05-06 --- The suggestions below are ordered by priority. The first three are worth landing before merge; the rest can ship as small follow-ups. ## 1. (medium) Validate region up front when an IAM role is configured `IcebergAwsAssumeRoleProperties.putAssumeRoleProperties` writes `aws.region` and `client.assume-role.region` directly from `s3Properties.getRegion()`. If the region is blank, the empty value is still written, and the failure surfaces later as an obscure AWS SDK error. Fail fast with a clear message instead: ```java public static void putAssumeRoleProperties(Map<String, String> target, S3Properties s3Properties) { if (StringUtils.isBlank(s3Properties.getS3IAMRole())) { return; } Preconditions.checkArgument( StringUtils.isNotBlank(s3Properties.getRegion()), "s3.region must be set when s3.role_arn is configured for Iceberg AWS clients"); ... } ``` Equivalently, the assertion can live in the `initNormalizeAndCheckProps` of `IcebergRestProperties` and `IcebergS3TablesMetaStoreProperties`, before the helper is invoked. ## 2. (medium) Reject the underscore variant of `iceberg.rest.external-id` `ICEBERG_REST_EXTERNAL_ID` is defined as `iceberg.rest.external-id` (hyphenated), but most other Doris properties β including `s3.external_id` β use underscores. Users will inevitably try `iceberg.rest.external_id`, and that key currently passes through silently because it is neither rejected nor recognized as an alias of `s3ExternalId`. Either reject both spellings: ```java private static final String ICEBERG_REST_EXTERNAL_ID = "iceberg.rest.external-id"; private static final String ICEBERG_REST_EXTERNAL_ID_UNDERSCORE = "iceberg.rest.external_id"; rejectUnsupportedAwsAssumeRoleProperty(ICEBERG_REST_EXTERNAL_ID); rejectUnsupportedAwsAssumeRoleProperty(ICEBERG_REST_EXTERNAL_ID_UNDERSCORE); ``` β¦or pick one canonical naming style and document it explicitly. Right now `iceberg.rest.role_arn` (underscore) and `iceberg.rest.external-id` (hyphen) coexist, which is confusing. ## 3. (medium) Reject `s3.role_arn` when signing-name is neither `glue` nor `s3tables` If a user mistypes `iceberg.rest.signing-name` (for example `s3table` instead of `s3tables`), `shouldUseS3PropertiesForRestCredentials()` returns false, and any configured `s3.role_arn` is silently dropped. The catalog then falls back to the default provider chain, which is an extremely difficult misconfiguration to debug. Surface the conflict explicitly in `buildRules()`: ```java if (StringUtils.isNotBlank(origProps.get(S3Properties.ROLE_ARN)) && !shouldUseS3PropertiesForRestCredentials()) { throw new IllegalArgumentException( "s3.role_arn is only supported when iceberg.rest.signing-name is " + "'glue' or 's3tables'. Got signing-name='" + icebergRestSigningName + "'"); } ``` ## 4. (low) Use a class reference instead of a string literal in `getV2ClassName(mode)` The new single-argument overload returns a literal string for the `DEFAULT` case: ```java case DEFAULT: return "software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider"; ``` while every other branch uses `XxxClass.class.getName()`. The literal will silently rot if the AWS SDK ever moves the class. Prefer: ```java case DEFAULT: return software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.class.getName(); ``` This is consistent with the rest of the file and gets compile-time validation. ## 5. (low) Rename `putS3FileIOCredentialProperties` or split it The method name advertises "S3 FileIO credentials," but it actually does two things: it writes the standard `S3FileIOProperties.*` keys *and* layers AssumeRole / provider-chain configuration on top via the internal `CredentialType` switch. Either rename it to something like `populateS3FileIOAndCredentials`, or split it into `populateS3FileIO` + `populateCredentials` so callers that need only the FileIO portion can reuse it without inheriting the credential-resolution logic. ## 6. (low) Widen the try/catch in `IcebergS3TablesMetaStoreProperties.initCatalog` The new `S3TablesClient client = buildS3TablesClient(catalogProps);` line lives outside the existing try block, so an exception from there (`Region.of` rejecting an invalid region, `HttpClientProperties` parse errors, etc.) bypasses the helpful "Failed to initialize S3TablesCatalog" wrapper. Move the assignment inside the try: ```java try { S3TablesClient client = buildS3TablesClient(catalogProps); S3TablesCatalog catalog = new S3TablesCatalog(); catalog.initialize(catalogName, catalogProps, client); return catalog; } catch (Exception e) { throw new RuntimeException("Failed to initialize S3TablesCatalog for Iceberg. ...", e); } ``` ## 7. (low) Replace `assertTrue(false)` with `Assertions.fail` in the regression test `test_iceberg_s3tables_catalog_credentials_provider.groovy` uses `assertTrue(false)` as a "should have thrown" sentinel. When the test does fail, the message is just `expected true got false`, which is hard to interpret. Use: ```groovy try { sql """show databases;""" Assertions.fail("Expected exception when web identity is missing") } catch (Exception e) { assertTrue(e.getMessage().contains(expectedMessage)) } ``` ## 8. (suggested) Add the missing unit tests The current FE coverage is strong but leaves a few gaps: - **Session token round-trip**: add a `testGlueWithSessionToken` (or `s3tables` equivalent) in `IcebergRestPropertiesTest` that sets `iceberg.rest.session-token` and asserts the final `rest.session-token` value in the catalog properties. - **REST + IAM Role via `S3Properties`**: add `testGlueWithIamRoleViaS3Properties` that sets `s3.role_arn` (and optionally `s3.external_id`) with `signing-name=glue`, then asserts the catalog properties contain `client.assume-role.arn`, `client.assume-role.external-id`, and `client.factory=AssumeRoleAwsClientFactory`. - **`AbstractIcebergProperties.toS3FileIOProperties` AssumeRole branch**: build a storage list containing a single `S3Properties` with `s3.role_arn` set, call `toFileIOProperties`, and assert the resulting map contains `client.assume-role.arn` and friends. ## 9. (suggested) Expand the PR description / release note The current PR description is one line ("FYI #59893"). Reviewers and downstream users will benefit from explicit notes on: - **New properties**: `iceberg.rest.session-token`, `iceberg.rest.credentials_provider_type` (the latter exists since #59893 but its behavior changes here). - **Behavior changes**: REST `signing-name=glue` no longer requires `iceberg.rest.access-key-id` and `iceberg.rest.secret-access-key`; `s3.role_arn` and `s3.external_id` now drive STS AssumeRole; `iceberg.rest.role_arn` and `iceberg.rest.external-id` are now explicitly rejected. - **Removal**: `org.apache.doris.datasource.iceberg.s3tables.CustomAwsCredentialsProvider` is deleted. It was an internal helper, but the removal should still be called out in case anyone depends on it. --- ## Overall The direction, abstraction, and tests are all above average. The main remaining work is making error paths louder and more specific (items 1β3 above): validate region, reject the underscore variant of `external-id`, and refuse `s3.role_arn` when the signing-name is misconfigured. The remaining items are non-blocking polish and can ship as follow-ups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
