[
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969102#comment-15969102
]
Jason Lowe commented on MAPREDUCE-6876:
---------------------------------------
I believe the TokenCache.obtainTokensForNamenodes code is necessary. The input
format must obtain the necessary tokens for the tasks to be able to access the
input splits, and this is how FileInputFormat accomplishes that.
The tokens are delegated to another process via the job submission process. In
the code line that was called out above, the TokenCache is receiving the job
credentials as an argument. Those credentials will be populated with the
tokens for the namenodes involved in the input paths if they aren't already
present. Later the job submitter code will pass the job credentials to the
job. The job's tasks in turn will use the tokens in the credentials to
authenticate with the various filesystems that are hosting the split data.
The token grabbing seems unnecessary in a typical, "standard" job since the job
submission code already grabs a token for the job staging directory. That
staging directory is often on the same filesystem as the input data, so the
same token covers both. However if the user specified a remote filesystem path
as input then without this code the job client will not know how to obtain
tokens for the remote filesystem and the tasks will ultimately fail to
authenticate with the remote filesystem.
> FileInputFormat.listStatus should not fetch delegation tokens
> -------------------------------------------------------------
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens:
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary. {{listStatus}} doesn't delegate those tokens to
> another process. This is causing issues described in the attached Spark
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is
> used to fetch the delegation tokens, assumes that certain MapReduce
> configuration variables are set, which isn't true in the Spark calling code.
> This is a separate problem, but nonetheless it wouldn't have arisen if
> {{listStatus}} weren't fetching delegation tokens.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]