[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962440#comment-15962440
 ] 

Michael Gummelt commented on SPARK-16742:
-----------------------------------------

Hi [~vanzin],

[~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation 
for now, and we at Mesosphere are about to submit a PR to upstream our 
implementation.  I have a few questions I'd like to run by you to make sure 
that PR goes smoothly.

1) I've been following your comments on this Spark Standalone Kerberos PR: 
https://github.com/apache/spark/pull/17530.  It looks like your concern is that 
in *cluster mode*, the keytab is written to a file on the host running the 
driver, and is owned by the user of the Spark Worker, which will be the same 
for each job.  So jobs submitted by multiple users will be able to read each 
other's keytabs.  In *client mode*, it looks like the delegation tokens are 
written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the 
executor, which suffers from the same problem as the keytab in cluster mode.

The problem is then that a kerberos-authenticated user submitting their job 
would be unaware that their credentials are being leaked to other users.  Is 
this an accurate description of the issue?  

2) I understand that YARN writes delegation tokens via 
{{amContainer.setTokens()}}, which ultimately results in the delegation token 
being written to a file owned by the submitting user.  However, since the 
"submitting user" is a Kerberos user, not a Unix user, I'm assuming that 
{{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix 
user who runs the ApplicationMaster and owns that file.  Is that correct?

To avoid the shared-file problem for delegation tokens, our Mesos 
implementation currently has the Executor issue an RPC call to fetch the 
delegation token from the driver.  There therefore isn't any need for at-rest 
encryption, and if in-motion encryption is in the user's threat model, then can 
be sure to run Spark with SSL.

We avoid the shared-file problem for keytabs entirely, because there's no need 
to distribute the keytab, at least in client mode.  Unlike YARN, the driver and 
the equivalent of the "ApplicationMaster" in Mesos are one and the same.  They 
both exist in the same process, the {{spark-submit}} process.

We're probably going to punt on cluster mode for now, just for simplicity, but 
we should be able to solve this in cluster mode as well, because unlike 
standalone, and much like YARN, Mesos controls what user the driver runs as.

What do you think of the above approach?  If you see any blockers, I would very 
much appreciate teasing those out now rather than during the PR.  Thanks!

> Kerberos support for Spark on Mesos
> -----------------------------------
>
>                 Key: SPARK-16742
>                 URL: https://issues.apache.org/jira/browse/SPARK-16742
>             Project: Spark
>          Issue Type: New Feature
>          Components: Mesos
>            Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to