[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596332#comment-13596332 ]
Jason Lowe commented on MAPREDUCE-5042: --------------------------------------- I thought about the upload-to-staging-for-future-attempts solution but it seemed passing the secret in the job credentials was a bit cleaner and avoided the extra HDFS operations. As for splitting the job token into shuffle and task, I didn't want to change the current task authentication behavior. Allowing an old task attempt to authenticate with a new app attempt seemed like it would be a problem waiting to happen. But we need the shuffle secret to persist across app attempts, hence the push to split them as part of this change. > Reducer unable to fetch for a map task that was recovered > --------------------------------------------------------- > > Key: MAPREDUCE-5042 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, security > Affects Versions: 0.23.7, 2.0.5-beta > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Blocker > Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch > > > If an application attempt fails and is relaunched the AM will try to recover > previously completed tasks. If a reducer needs to fetch the output of a map > task attempt that was recovered then it will fail with a 401 error like this: > {noformat} > java.io.IOException: Server returned HTTP response code: 401 for URL: > http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_000016_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) > {noformat} > Looking at the corresponding NM's logs, we see the shuffle failed due to > "Verification of the hashReply failed". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira