[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969102#comment-15969102
 ] 

Jason Lowe commented on MAPREDUCE-6876:
---------------------------------------

I believe the TokenCache.obtainTokensForNamenodes code is necessary.  The input 
format must obtain the necessary tokens for the tasks to be able to access the 
input splits, and this is how FileInputFormat accomplishes that.

The tokens are delegated to another process via the job submission process.  In 
the code line that was called out above, the TokenCache is receiving the job 
credentials as an argument.  Those credentials will be populated with the 
tokens for the namenodes involved in the input paths if they aren't already 
present.  Later the job submitter code will pass the job credentials to the 
job.  The job's tasks in turn will use the tokens in the credentials to 
authenticate with the various filesystems that are hosting the split data.

The token grabbing seems unnecessary in a typical, "standard" job since the job 
submission code already grabs a token for the job staging directory.  That 
staging directory is often on the same filesystem as the input data, so the 
same token covers both.  However if the user specified a remote filesystem path 
as input then without this code the job client will not know how to obtain 
tokens for the remote filesystem and the tasks will ultimately fail to 
authenticate with the remote filesystem.


> FileInputFormat.listStatus should not fetch delegation tokens
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-6876
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to