[
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955528#comment-15955528
]
Ian Hummel commented on SPARK-5158:
-----------------------------------
At Bloomberg we've been working on a solution to this issue so we can access
kerberized HDFS clusters from standalone Spark installations we run on our
internal cloud infrastructure.
Folks who are interested can try out a patch at
https://github.com/themodernlife/spark/tree/spark-5158. It extends standalone
mode to support configuration related to {{--principal}} and {{--keytab}}.
The main changes are
- Refactor {{ConfigurableCredentialManager}} and related
{{CredentialProviders}} so that they are no longer tied to YARN
- Setup credential renewal/updating from within the
{{StandaloneSchedulerBackend}}
- Ensure executors/drivers are able to find initial tokens for contacting HDFS
and renew them at regular intervals
The implementation does basically the same thing as the YARN backend. The
keytab is copied to driver/executors through an environment variable in the
{{ApplicationDescription}}. I might be wrong, but I'm assuming proper
{{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can
anyone confirm?). Credentials on the executors and the driver (cluster mode)
are written to disk as whatever user the Spark daemon runs as. Open to
suggestions on whether it's worth tightening that up.
Would appreciate any feedback from the community.
> Allow for keytab-based HDFS security in Standalone mode
> -------------------------------------------------------
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Patrick Wendell
> Assignee: Matthew Cheah
> Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS
> clusters in standalone mode. The main reason we haven't accepted these
> patches have been that they rely on insecure distribution of token files from
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the
> Spark driver and executors independently log in and acquire credentials using
> a keytab. This would work for users who have a dedicated, single-tenant,
> Spark clusters (i.e. they are willing to have a keytab on every machine
> running Spark for their application). It wouldn't address all possible
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated
> hardware since they are long-running services.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]