DCOS Spark 1.6.1 supports kerberos. It'll be available in DCOS 1.7, to be released in a couple weeks.
On Tue, Apr 12, 2016 at 9:57 PM, Tony Kinsley <tkinsle...@gmail.com> wrote: > I have been working towards getting some spark streaming jobs to run in > Mesos cluster mode (using docker containers) and write data periodically to > a secure HDFS cluster. Unfortunately this does not seem to be well > supported currently in spark ( > https://issues.apache.org/jira/browse/SPARK-12909). The problem seems to > be that A) passing in a principal and keytab only get processed if the > backend is yarn, B) all the code for renewing tickets is implemented by the > yarn backend. > > > My first attempt to get around this problem was to create docker > containers that would use a custom entrypoint to run a process manager. > Then have cron running in each container which would periodically run > kinit. I was hoping this would work since the spark can correctly log in if > the TGT exists (at least from my tests manually kinit’ing and running spark > in local mode). However this hack will not work (currently anyways) as the > Mesos scheduler does not specify whether a shell should be used for the > command. Mesos will default to using the shell and then override the > entrypoint of the docker image with /bin/sh ( > https://issues.apache.org/jira/browse/MESOS-1770). > > > Since I have not been able to come up with an acceptable work around I am > looking into the possibility of adding the functionality into Spark, but I > wanted to check in to make sure I was not duplicating others work and also > to get some general advice on a good approach to solving this problem. I > have found this old email chain that talks about some different challenges > associated with authenticating correctly to the NameNodes ( > http://comments.gmane.org/gmane.comp.lang.scala.spark.user/14257). > > > I've noticed that the Yarn security settings are namespaced to be specific > to Yarn and that there is some code that seems to be fairly generic > (AMDelegationTokenRenewer.scala and ExecutorDelegationTokenUpdater for > instance although I'm not sure about the use of the YarnSparkHadoopUtils). > It would seem to me that some of this code could be reused across the > various cluster backends. That said, I am fairly new to working with Hadoop > and Spark, and do not claim to understand the inner workings of Yarn or > Mesos, although I feel much more comfortable with Mesos. > > > I would definitely appreciate some guidance especially since whatever work > that I or ViaSat (my employer) gets working we would definitely be > interested in contributing it back and would very much want to avoid > maintaining a fork of Spark. > > Tony > > > -- Michael Gummelt Software Engineer Mesosphere