[
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516856#comment-16516856
]
Jelmer Kuperus commented on SPARK-5158:
---------------------------------------
I ended up with the following workaround which at first glance seems to work
1. create a `.java.login.config` file in the home directory of the spark with
the following contents
{noformat}
com.sun.security.jgss.krb5.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
useTicketCache="true"
ticketCache="/tmp/krb5cc_0"
keyTab="/path/to/my.keytab"
principal="[email protected]";
};{noformat}
2. put a krb5.conf file in /etc/krb5.conf
3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml`
set :
* fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1
* hadoop.security.authentication to kerberos
* hadoop.security.authorization to true
4. make sure the hadoop config gets is on the classpath of spark. Eg the
process should have something like this in it
{noformat}
-cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat}
> Allow for keytab-based HDFS security in Standalone mode
> -------------------------------------------------------
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Patrick Wendell
> Assignee: Matthew Cheah
> Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS
> clusters in standalone mode. The main reason we haven't accepted these
> patches have been that they rely on insecure distribution of token files from
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the
> Spark driver and executors independently log in and acquire credentials using
> a keytab. This would work for users who have a dedicated, single-tenant,
> Spark clusters (i.e. they are willing to have a keytab on every machine
> running Spark for their application). It wouldn't address all possible
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated
> hardware since they are long-running services.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]