[ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133893#comment-13133893 ]
Vinod Kumar Vavilapalli commented on MAPREDUCE-2708: ---------------------------------------------------- Okay. I've finally cornered this. That was a wild-goose-chase. And tiresome work. It is actually not related to trunk or 0.23. By default {{dfs.block.access.token.enable}} is set to false, and so clients weren't able to contact datanodes when they need something like an incomplete block's length. The error goes away when I set this explicitly. But I don't think this should be needed if we enabled {{hadoop.security.authentication}} already, will file a DFS ticket and see if there is a reason why they did that other than having a quick flag to disable to the feature. Apart from that, there are couple of other bugs related to recovery, client keep reconnecting to the new AM again and again, jobs with reduces fail their reduces after restart etc. Will fix them separately. > [MR-279] Design and implement MR Application Master recovery > ------------------------------------------------------------ > > Key: MAPREDUCE-2708 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster, mrv2 > Affects Versions: 0.23.0 > Reporter: Sharad Agarwal > Assignee: Sharad Agarwal > Priority: Blocker > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2708-20111021.1.txt, > MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, > mr2708_v2.patch > > > Design recovery of MR AM from crashes/node failures. The running job should > recover from the state it left off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira