Thanks Kousuke. Let's proceed independently. I'll get something ready while we wait for the SPIP review.
Parth On Sat, Jun 6, 2026 at 9:38 AM Kousuke Saruta <[email protected]> wrote: > Hi Parth, > > Thank you for the thoughtful response. I think the incremental approach > (your path1) might be feasible. Our proposals are complementary and > independent. They address different problems and can proceed in parallel > without blocking each other. > Your DirectTokenProvider unblocks non-Kerberos credential providers in the > existing HadoopDelegationTokenManager mechanism. This solves the > immediate gate problem for environments where pod-level identity is > unavailable or insufficient. > My SPIP introduces per-user/per-session identity propagation with a > separate manager and RPC, targeting the case where executors need > credentials derived from a user identity they cannot obtain themselves. > Neither depends on the other landing first. They share no code paths > (Different manager, different RPC, and different SPI). > > Regarding the 2022 review feedback (sorry, I didn't know about that), the > constraints have shifted since then. Per-user identity propagation and > Spark Connect multi-tenancy require expressing per-session identity, but > Spark's current use of UGI is process-wide so per-session scoping would > require fundamental changes to how Spark interacts with UGI. In the > Appendix C in my SPIP doc, the rejection of UGI reflects these new > requirements. > Regarding binary payloads in ServiceCredential, for the initial > implementation, Base64 encoding within Map[String, String] is sufficient > for the S3A use case. Since the SPI is annotated @DeveloperApi, we can add > a byte[] field or richer payload type in a future release if concrete > integrations require binary credentials. I'd prefer to keep the initial > surface small and evolve based on real demand. > > Best, > Kousuke > > 2026年6月6日(土) 2:40 Parth Chandra <[email protected]>: > >> Subject: Re: [DISCUSS] SPIP: OIDC Credential Propagation >> >> Hi Kousuke, >> >> Thanks for putting this together. As the author of the >> original SPARK-38954 [3] and PR #37558 [4], I'm glad to see this problem >> getting formal attention — it's been a real gap for cloud-native Spark >> deployments. >> >> I think we're aligned on the problem but differ on scope. Your proposal >> addresses identity-aware credential propagation (per-user authorization, >> audit trails, Spark Connect multi-tenancy). That's a compelling long-term >> direction. The problem I was trying to solve in PR #37558 [4] is narrower: >> enable non-Kerberos credential providers to participate in the existing >> distribution mechanism, which is already provider-agnostic (as the Kafka >> provider demonstrates) but gated on Kerberos activation. >> >> After the review feedback on PR #37558 [4] — specifically the direction >> that we should use a single auth-agnostic manager and UGI as the container >> — I've been working on a minimal approach SPARK-27252 [1][2]: a >> DirectTokenProvider sub-trait of the existing >> HadoopDelegationTokenProvider, with routing logic inside the existing >> HadoopDelegationTokenManager to call direct providers without doAs(). This >> requires ~80 lines of changes to existing >> code, no new manager, no new RPC message, and no new credential store. >> It follows the review feedback from PR #37558 [4] exactly. >> >> I see two paths forward and am happy with either: >> >> 1. *Incremental*: The minimal DirectTokenProvider change SPARK-27252 >> [1][2] lands first, unblocking the immediate use case (driver-mediated >> credential refresh without Kerberos). Your UserCredentialManager and >> identity-aware architecture can then build on top — or alongside — when >> the broader scope (Spark Connect, per-user identity, multi-cloud) is ready. >> The two aren't mutually exclusive. >> 2. *Unified*: If the community prefers to solve the full identity >> propagation problem in one shot, I'd be glad to collaborate on your >> proposal. In that case I'd suggest we address the relationship to the 2022 >> review feedback explicitly — specifically the preference for a single >> manager and UGI as a container. Your Appendix C rejects that direction; it >> would strengthen the proposal to explain why the constraints have changed >> (Spark Connect multi-tenancy, per-user identity requirements that UGI >> cannot express). >> >> One technical observation: your proposal's CredentialProvider.resolve() >> returns a ServiceCredential with Map[String, String] properties. For the >> S3A case this works well (access key, secret key, session token are >> strings). But some credential systems return binary payloads (signed SAML >> assertions, serialized protobuf tokens). Worth considering whether >> Map[String, byte[]] or an opaque byte[] field alongside the properties map >> would future-proof the SPI. >> >> Happy to discuss further. >> >> Best, >> Parth >> >> [1] https://issues.apache.org/jira/browse/SPARK-57252 >> [2] >> https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.21tncge82jbl >> [3] https://issues.apache.org/jira/browse/SPARK-38954 >> [4] https://github.com/apache/spark/pull/37558 >> >>>
