Hi all,

I'd like to propose a SPIP for adding delegated credential propagation to
Spark on K8s, generalizing the existing delegation-token model to
OIDC/OAuth-based environments.

SPIP Doc:
https://docs.google.com/document/d/1usJKncCPMiyFUg7aIdpZ0HQsklXIHow_sU_6dfFMjN0/edit?usp=sharing

## Problem

Spark on K8s has no mechanism to take a user/session identity available at
thedriver, exchange it for short-lived storage credentials, and propagate
those credentials to dynamically created executors. All Jobs access cloud
storage as the pod's service account,making per-user authorization and
audit logging impossible without workarounds.

This is the equivalent gap to what Kerberos + allegation tokens solve on
YARN + HDFS but for K8s + cloud storage.

## Proposal

Introduce a CredentialProvider SPI and propagation mechanism that:

1. Reads an OIDC identity token from a configured file path on the driver
2. Exchange it for short-lived service credentials (via STS or compatible)
3. Distributes only those credential (not the raw token) to executors via a
new RPC
4. Automatically refreshes credentials for long-running jobs

The raw identity token never leaves the driver. Executors receive only
short-lived delegated service credentials - mirroring how kerberos
propagates delegation tokens rather than the TGT.

The design mirrors the existing HadoopDelegationTokenManager /
UpdateDelegationTokens pattern, coexists with Kerberos, and is gated by
spark.security.credentials.enabled=false (default).

A reference provider for S3/STS-compatible storage (AWS, MinIO, Ceph) is
included. Azure/GCP providers and Spark Connect integration are out of
scope but the SPI is designed to accommodate them without changes.

Both workload-level SA tokens and per-user identity tokens are supported.
With per-user tokens, STS trust policies can enforce access control based
on the user's identity - enabling true per-user authorization that is
impossible with IRSA/Pod Identity alone.

## Key design decisions

- Core SPI is cloud-agnostic (no AWS/Azure/GCP SDK in core)
- Reference provider lives in connector/credential-aws
- Raw identity token stays on the driver; executors get only delegated
service credentials
- Works with any STS-compatible endpoint (not just AWS)
- @DeveloperApi annotation allows SPI evolution
- Platform-agnostic core, with K8s as the primary target

Full details, architecture diagram, and sequence diagram are in the design
document linked above. I welcome any feedback on the approach.

Thanks,
Kousuke

Reply via email to