[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596332#comment-13596332
 ] 

Jason Lowe commented on MAPREDUCE-5042:
---------------------------------------

I thought about the upload-to-staging-for-future-attempts solution but it 
seemed passing the secret in the job credentials was a bit cleaner and avoided 
the extra HDFS operations.

As for splitting the job token into shuffle and task, I didn't want to change 
the current task authentication behavior.  Allowing an old task attempt to 
authenticate with a new app attempt seemed like it would be a problem waiting 
to happen.  But we need the shuffle secret to persist across app attempts, 
hence the push to split them as part of this change.
                
> Reducer unable to fetch for a map task that was recovered
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-5042
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, security
>    Affects Versions: 0.23.7, 2.0.5-beta
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch
>
>
> If an application attempt fails and is relaunched the AM will try to recover 
> previously completed tasks.  If a reducer needs to fetch the output of a map 
> task attempt that was recovered then it will fail with a 401 error like this:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_000016_0
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231)
>       at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156)
> {noformat}
> Looking at the corresponding NM's logs, we see the shuffle failed due to 
> "Verification of the hashReply failed".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to