Sbaia commented on PR #60498:
URL: https://github.com/apache/doris/pull/60498#issuecomment-3851713914

   Thanks for picking this up and building on #59893, @CalvinKirs.
   
   I want to clarify an important distinction about what this PR actually 
enables vs. what users on EKS/GKE typically need.
   
   **What this PR adds:** cross-account assume-role support 
(`AssumeRoleAwsClientFactory`), where a `role_arn` is explicitly provided in 
the catalog properties.
   
   **What this PR does NOT add:** support for IRSA (IAM Roles for Service 
Accounts) or, more generally, the AWS default credentials chain without any 
explicit credential configuration.
   
   The validation in `IcebergRestProperties` (lines 128-137) explicitly rejects 
the case where neither access-key/secret-key nor `role_arn` is provided:
   
   ```java
   boolean hasAccessKeyAndSecret = ...;
   boolean hasIamRole = ...;
   return !hasAccessKeyAndSecret && !hasIamRole;  // fails validation
   ```
   
   With IRSA (and future equivalents like GKE Workload Identity or ECS task 
roles), the pod already has credentials injected via the service account. The 
user doesn't have a `role_arn` to specify in the catalog DDL — the role is 
attached to the pod's ServiceAccount via the `eks.amazonaws.com/role-arn` 
annotation. The correct behavior is to let Iceberg use the AWS SDK's 
`DefaultCredentialsProvider`, which picks up the IRSA token automatically. This 
means accepting a catalog definition with **no credential properties at all**.
   
   **Regarding the assume-role path itself:** `AssumeRoleAwsClientFactory` uses 
the AWS SDK's default credentials chain internally to obtain the **base 
credentials** needed to call `sts:AssumeRole`. So who actually performs the 
AssumeRole call depends on the environment:
   
   - **EKS with IRSA configured:** the pod's ServiceAccount role (via web 
identity token) is used as the base identity to assume the target role. This 
requires a trust policy on the target role that allows the SA role to assume it.
   - **EKS without IRSA:** falls back to the EC2 instance profile (node role 
via IMDS). The node role needs `sts:AssumeRole` permission on the target role. 
This is less secure since all pods on the node share the same base identity.
   - **EC2 (non-Kubernetes):** the instance profile role is used directly.
   
   So the assume-role feature works, but it's solving a different use case 
(cross-account access) than what most EKS users need (direct IRSA credentials 
without specifying any role_arn).
   
   Our original PR #59893 takes a simpler approach: when no explicit 
credentials are provided, we just don't set any credential properties, allowing 
Iceberg/AWS SDK to resolve credentials through the default chain. This covers 
IRSA, instance profiles, environment variables, and any future pod identity 
mechanism — with zero configuration in the catalog DDL.
   
   I'd suggest either:
   1. Relaxing the validation to allow no credentials at all (supporting both 
IRSA and assume-role), or
   2. Keeping assume-role as an addition on top of #59893's default-chain 
support
   
   Happy to collaborate on merging the two approaches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to