Hi Hadoop Development Team,
I hope this email finds you well. I'm reaching out regarding an enhancement
opportunity for the Azure ABFS WorkloadIdentityTokenProvider that would
significantly improve its extensibility for cloud-native deployments.

*Current Limitation*
The current WorkloadIdentityTokenProvider implementation works well for
file-based token scenarios, but it's tightly coupled to file system
operations and cannot be easily extended for alternative token sources.
Specifically:
The WorkloadIdentityTokenProvider is designed around reading tokens from
files
The AbfsConfiguration instantiation logic restricts
<https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java#L1272-L1285>
the
provider creation, making extension impossible.

*Use Case*:* Kubernetes TokenRequest API*
In modern Kubernetes environments, the recommended approach is to use the
TokenRequest API to generate short-lived, on-demand service account tokens
rather than relying on projected volume mounts. This approach: Eliminates
token expiration issues with long-lived projected tokens Provides better
security through shorter token lifetimes Follows Kubernetes security best
practices Supports dynamic token generation with custom audiences

*Proposed Enhancement *
I propose modifying WorkloadIdentityTokenProvider to accept a Supplier for
token retrieval instead of being hardcoded to file operations:

public class WorkloadIdentityTokenProvider extends AccessTokenProvider {
  private final Supplier<String> tokenSupplier;

  // Current constructor for backward compatibility
  public WorkloadIdentityTokenProvider(String tokenFilePath) {
    this(() -> readTokenFromFile(tokenFilePath));
  }

  // New constructor for extensibility
  public WorkloadIdentityTokenProvider(Supplier<String> tokenSupplier) {
    this.tokenSupplier = tokenSupplier;
  }

  @Override
  protected AzureADToken refreshToken() throws IOException {
    String token = tokenSupplier.get();
    // ... existing logic
  }
}

*Additional Section: Comparison with Current Workarounds *
Currently, teams implementing Kubernetes TokenRequest API integration must
create custom implementations of CustomTokenProviderAdaptee, resulting in a
code duplicacy which is already being done in
WorkloadIdentityTokenProvider. Also, WorkloadIdentityTokenProvider seems to
be built for the exact same use case to do workload Identity federation via
KSA(Kubernetes Service Account Token) as physical identity of pod to
exchange AAD tokens.


*Alternative Approaches *
If modifying the existing class isn't preferred, consider:

   - Making the class check in AbfsConfiguration.getTokenProvider to
   isAssignableFrom instead of equality
   - Extracting a token acquisition interface that implementations can
   provide

Reply via email to