[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Vinod Kumar Vavilapalli (Commented) (JIRA) Mon, 24 Oct 2011 01:38:57 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133893#comment-13133893
 ]


Vinod Kumar Vavilapalli commented on MAPREDUCE-2708:
----------------------------------------------------

Okay. I've finally cornered this. That was a wild-goose-chase. And tiresome 
work.

It is actually not related to trunk or 0.23. By default 
{{dfs.block.access.token.enable}} is set to false, and so clients weren't able 
to contact datanodes when they need something like an incomplete block's 
length. The error goes away when I set this explicitly. But I don't think this 
should be needed if we enabled {{hadoop.security.authentication}} already, will 
file a DFS ticket and see if there is a reason why they did that other than 
having a quick flag to disable to the feature.

Apart from that, there are couple of other bugs related to recovery, client 
keep reconnecting to the new AM again and again, jobs with reduces fail their 
reduces after restart etc. Will fix them separately.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, 
> MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, 
> mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should 
> recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Reply via email to