[
https://issues.apache.org/jira/browse/SPARK-20060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kent Yao updated SPARK-20060:
-----------------------------
Description:
h1. Brief design
h2. Introductions
The basic issue for Standalone mode to visit kerberos secured HDFS or other
kerberized Services is how to gather the delegated tokens on the driver side
and deliver them to the executor side.
When we run Spark on Yarn, we set the tokens to the container launch context to
deliver them automatically and for long-term running issue caused by token
expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS and
updating the credential file and renewing them over and over.
When run Spark On Standalone, we currently have no implementations like Yarn to
get and deliver those tokens.
h2. Implementations
Firstly, we simply move the implementation of SPARK-14743 which is only for
yarn to core module. And we use it to gather the credentials we need, and also
we use it to update and renew with credential files on HDFS.
Secondly, credential files on secured HDFS are reachable for executors before
they get the tokens. Here we add a sequence configuration
`spark.deploy.credential. entities` which is used by the driver to put
`token.encodeToUrlString()` before launching the executors, and used by the
executors to fetch the credential as a string sequence during fetching the
driver side spark properties, and then decode them to tokens. Before setting
up the `CoarseGrainedExecutorBackend` we set the credentials to current
executor side ugi.
was:For **Spark on non-Yarn** mode on a kerberized hdfs, we don't obtain
credentials from hive metastore, hdfs, etc and just use the local kinited user
to connecting them. But if we specify the --proxy-user argument on non-yarn
mode, such as local, standalone, after we simply use `UGI.createProxyUser` to
get a proxy ugi as the effective user and wrap the code in doAs, the proxy ugi
fails to talk to hive metastore cause by no crendentials. Thus, we need to
obtain credentials via the real user and add them to the proxy ugi.
Component/s: (was: Spark Submit)
Spark Core
Issue Type: New Feature (was: Bug)
Summary: Support Standalone visiting secured HDFS (was: Spark On
Non-Yarn Mode with Kerberized HDFS ProxyUser Fails Talking to Hive MetaStore )
> Support Standalone visiting secured HDFS
> -----------------------------------------
>
> Key: SPARK-20060
> URL: https://issues.apache.org/jira/browse/SPARK-20060
> Project: Spark
> Issue Type: New Feature
> Components: Deploy, Spark Core
> Affects Versions: 2.2.0
> Reporter: Kent Yao
>
> h1. Brief design
> h2. Introductions
> The basic issue for Standalone mode to visit kerberos secured HDFS or other
> kerberized Services is how to gather the delegated tokens on the driver side
> and deliver them to the executor side.
> When we run Spark on Yarn, we set the tokens to the container launch context
> to deliver them automatically and for long-term running issue caused by token
> expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS
> and updating the credential file and renewing them over and over.
> When run Spark On Standalone, we currently have no implementations like Yarn
> to get and deliver those tokens.
> h2. Implementations
> Firstly, we simply move the implementation of SPARK-14743 which is only for
> yarn to core module. And we use it to gather the credentials we need, and
> also we use it to update and renew with credential files on HDFS.
> Secondly, credential files on secured HDFS are reachable for executors before
> they get the tokens. Here we add a sequence configuration
> `spark.deploy.credential. entities` which is used by the driver to put
> `token.encodeToUrlString()` before launching the executors, and used by the
> executors to fetch the credential as a string sequence during fetching the
> driver side spark properties, and then decode them to tokens. Before setting
> up the `CoarseGrainedExecutorBackend` we set the credentials to current
> executor side ugi.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]