Thanks Kousuke. Let's proceed independently. I'll get something ready while
we wait for the SPIP review.


Parth

On Sat, Jun 6, 2026 at 9:38 AM Kousuke Saruta <[email protected]> wrote:

> Hi Parth,
>
> Thank you for the thoughtful response. I think the incremental approach
> (your path1) might be feasible. Our proposals are complementary and
> independent. They address different problems and can proceed in parallel
> without blocking each other.
> Your DirectTokenProvider unblocks non-Kerberos credential providers in the
> existing HadoopDelegationTokenManager mechanism. This solves the
> immediate gate problem for environments where pod-level identity is
> unavailable or insufficient.
> My SPIP introduces per-user/per-session identity propagation with a
> separate manager and RPC, targeting the case where executors need
> credentials derived from a user identity they cannot obtain themselves.
> Neither depends on the other landing first. They share no code paths
> (Different manager, different RPC, and different SPI).
>
> Regarding the 2022 review feedback (sorry, I didn't know about that), the
> constraints have shifted since then. Per-user identity propagation and
> Spark Connect multi-tenancy require expressing per-session identity, but
> Spark's current use of UGI is process-wide so per-session scoping would
> require fundamental changes to how Spark interacts with UGI. In the
> Appendix C in my SPIP doc, the rejection of UGI reflects these new
> requirements.
> Regarding binary payloads in ServiceCredential, for the initial
> implementation, Base64 encoding within Map[String, String] is sufficient
> for the S3A use case. Since the SPI is annotated @DeveloperApi, we can add
> a byte[] field or richer payload type in a future release if concrete
> integrations require binary credentials. I'd prefer to keep the initial
> surface small and evolve based on real demand.
>
> Best,
> Kousuke
>
> 2026年6月6日(土) 2:40 Parth Chandra <[email protected]>:
>
>>   Subject: Re: [DISCUSS] SPIP: OIDC Credential Propagation
>>
>>   Hi Kousuke,
>>
>>   Thanks for putting this together. As the author of the
>> original SPARK-38954 [3] and PR #37558 [4], I'm glad to see this problem
>> getting formal attention — it's been a real gap for cloud-native Spark
>> deployments.
>>
>>   I think we're aligned on the problem but differ on scope. Your proposal
>> addresses identity-aware credential propagation (per-user authorization,
>> audit trails, Spark Connect multi-tenancy). That's a compelling long-term
>> direction. The problem I was trying to solve in PR #37558 [4] is narrower:
>> enable non-Kerberos credential providers to participate in the existing
>> distribution mechanism, which is already provider-agnostic (as the Kafka
>> provider demonstrates) but gated on Kerberos activation.
>>
>>   After the review feedback on PR #37558 [4] — specifically the direction
>> that we should use a single auth-agnostic manager and UGI as the container
>> — I've been working on a minimal approach SPARK-27252 [1][2]: a
>> DirectTokenProvider sub-trait of the existing
>> HadoopDelegationTokenProvider, with routing logic inside the existing
>> HadoopDelegationTokenManager to call direct providers without doAs(). This
>> requires ~80 lines of changes to existing
>>   code, no new manager, no new RPC message, and no new credential store.
>> It follows the review feedback from PR #37558 [4] exactly.
>>
>>   I see two paths forward and am happy with either:
>>
>>   1. *Incremental*: The minimal DirectTokenProvider change SPARK-27252
>> [1][2] lands first, unblocking the immediate use case (driver-mediated
>> credential refresh without Kerberos). Your UserCredentialManager and
>> identity-aware  architecture can then build on top — or alongside — when
>> the broader scope (Spark Connect, per-user identity, multi-cloud) is ready.
>> The two aren't mutually exclusive.
>>   2. *Unified*: If the community prefers to solve the full identity
>> propagation problem in one shot, I'd be glad to collaborate on your
>> proposal. In that case I'd suggest we address the relationship to the 2022
>> review feedback explicitly — specifically the preference for a single
>> manager and UGI as a container. Your Appendix C rejects that direction; it
>> would strengthen the proposal to explain why the constraints have changed
>> (Spark Connect multi-tenancy, per-user identity requirements that UGI
>> cannot express).
>>
>>   One technical observation: your proposal's CredentialProvider.resolve()
>> returns a ServiceCredential with Map[String, String] properties. For the
>> S3A case this works well (access key, secret key, session token are
>> strings). But some credential systems return binary payloads (signed SAML
>> assertions, serialized protobuf tokens). Worth considering whether
>> Map[String, byte[]] or an opaque byte[] field alongside the properties map
>> would future-proof the SPI.
>>
>>   Happy to discuss further.
>>
>>   Best,
>>   Parth
>>
>>   [1] https://issues.apache.org/jira/browse/SPARK-57252
>>   [2]
>> https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.21tncge82jbl
>>   [3] https://issues.apache.org/jira/browse/SPARK-38954
>>   [4] https://github.com/apache/spark/pull/37558
>>
>>>

Reply via email to