[ 
https://issues.apache.org/jira/browse/SPARK-20060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-20060:
-----------------------------
    Description: 
h1. Brief design

h2. Introductions
The basic issue for Standalone mode to visit kerberos secured HDFS or other 
kerberized Services is how to gather the delegated tokens on the driver side 
and deliver them to the executor side. 

When we run Spark on Yarn, we set the tokens to the container launch context to 
deliver them automatically and for long-term running issue caused by token 
expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS and 
updating the credential file and renewing them over and over.  

When run Spark On Standalone, we currently have no implementations like Yarn to 
get and deliver those tokens.

h2. Implementations

Firstly, we simply move the implementation of SPARK-14743 which is only for 
yarn to core module. And we use it to gather the credentials we need, and also 
we use it to update and renew with credential files on HDFS.

Secondly, credential files on secured HDFS are reachable for executors before 
they get the tokens. Here we add a sequence configuration 
`spark.deploy.credential. entities` which is used by the driver to put 
`token.encodeToUrlString()` before launching the executors, and used by the 
executors to fetch the credential as a string sequence during fetching the 
driver side spark properties, and then decode them to tokens.  Before setting 
up the `CoarseGrainedExecutorBackend` we set the credentials to current 
executor side ugi. 



  was:For **Spark on non-Yarn** mode on a  kerberized hdfs, we don't obtain 
credentials from hive metastore, hdfs, etc and just use the local kinited user 
to connecting them. But if we specify the --proxy-user argument on non-yarn 
mode, such as local, standalone, after we simply use `UGI.createProxyUser` to 
get a proxy ugi as the effective user and wrap the code in doAs, the proxy ugi 
fails to talk to hive metastore cause by no crendentials. Thus, we need to 
obtain credentials via the real user and add them to the proxy ugi.

    Component/s:     (was: Spark Submit)
                 Spark Core
     Issue Type: New Feature  (was: Bug)
        Summary: Support Standalone visiting secured HDFS   (was: Spark On 
Non-Yarn Mode with Kerberized HDFS ProxyUser Fails Talking to Hive MetaStore )

> Support Standalone visiting secured HDFS 
> -----------------------------------------
>
>                 Key: SPARK-20060
>                 URL: https://issues.apache.org/jira/browse/SPARK-20060
>             Project: Spark
>          Issue Type: New Feature
>          Components: Deploy, Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Kent Yao
>
> h1. Brief design
> h2. Introductions
> The basic issue for Standalone mode to visit kerberos secured HDFS or other 
> kerberized Services is how to gather the delegated tokens on the driver side 
> and deliver them to the executor side. 
> When we run Spark on Yarn, we set the tokens to the container launch context 
> to deliver them automatically and for long-term running issue caused by token 
> expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS 
> and updating the credential file and renewing them over and over.  
> When run Spark On Standalone, we currently have no implementations like Yarn 
> to get and deliver those tokens.
> h2. Implementations
> Firstly, we simply move the implementation of SPARK-14743 which is only for 
> yarn to core module. And we use it to gather the credentials we need, and 
> also we use it to update and renew with credential files on HDFS.
> Secondly, credential files on secured HDFS are reachable for executors before 
> they get the tokens. Here we add a sequence configuration 
> `spark.deploy.credential. entities` which is used by the driver to put 
> `token.encodeToUrlString()` before launching the executors, and used by the 
> executors to fetch the credential as a string sequence during fetching the 
> driver side spark properties, and then decode them to tokens.  Before setting 
> up the `CoarseGrainedExecutorBackend` we set the credentials to current 
> executor side ugi. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to