Subject: Re: [DISCUSS] SPIP: OIDC Credential Propagation Hi Kousuke,
Thanks for putting this together. As the author of the original SPARK-38954 [3] and PR #37558 [4], I'm glad to see this problem getting formal attention — it's been a real gap for cloud-native Spark deployments. I think we're aligned on the problem but differ on scope. Your proposal addresses identity-aware credential propagation (per-user authorization, audit trails, Spark Connect multi-tenancy). That's a compelling long-term direction. The problem I was trying to solve in PR #37558 [4] is narrower: enable non-Kerberos credential providers to participate in the existing distribution mechanism, which is already provider-agnostic (as the Kafka provider demonstrates) but gated on Kerberos activation. After the review feedback on PR #37558 [4] — specifically the direction that we should use a single auth-agnostic manager and UGI as the container — I've been working on a minimal approach SPARK-27252 [1][2]: a DirectTokenProvider sub-trait of the existing HadoopDelegationTokenProvider, with routing logic inside the existing HadoopDelegationTokenManager to call direct providers without doAs(). This requires ~80 lines of changes to existing code, no new manager, no new RPC message, and no new credential store. It follows the review feedback from PR #37558 [4] exactly. I see two paths forward and am happy with either: 1. *Incremental*: The minimal DirectTokenProvider change SPARK-27252 [1][2] lands first, unblocking the immediate use case (driver-mediated credential refresh without Kerberos). Your UserCredentialManager and identity-aware architecture can then build on top — or alongside — when the broader scope (Spark Connect, per-user identity, multi-cloud) is ready. The two aren't mutually exclusive. 2. *Unified*: If the community prefers to solve the full identity propagation problem in one shot, I'd be glad to collaborate on your proposal. In that case I'd suggest we address the relationship to the 2022 review feedback explicitly — specifically the preference for a single manager and UGI as a container. Your Appendix C rejects that direction; it would strengthen the proposal to explain why the constraints have changed (Spark Connect multi-tenancy, per-user identity requirements that UGI cannot express). One technical observation: your proposal's CredentialProvider.resolve() returns a ServiceCredential with Map[String, String] properties. For the S3A case this works well (access key, secret key, session token are strings). But some credential systems return binary payloads (signed SAML assertions, serialized protobuf tokens). Worth considering whether Map[String, byte[]] or an opaque byte[] field alongside the properties map would future-proof the SPI. Happy to discuss further. Best, Parth [1] https://issues.apache.org/jira/browse/SPARK-57252 [2] https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.21tncge82jbl [3] https://issues.apache.org/jira/browse/SPARK-38954 [4] https://github.com/apache/spark/pull/37558 >
