[
https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105267#comment-15105267
]
Junping Du commented on MAPREDUCE-6608:
---------------------------------------
Thanks [~srikanth.sampath] and [~raju.bairishetti] for proposing this JIRA and
upload a design document. This work could be a significant improvement to our
MapReduce framework reliability.
Go through the current design doc, I think store new attempt address for MR AM
in zookeeper could have scalability issues in case MR job has massive running
tasks (ten thousands or more). I think it could be better to store/get new MR
AM location from HDFS which has better scalability.
Also, in my understanding, Yarn Service Registry may not best fit for this
case. CC [[email protected]] who is author of YSR.
I could propose another version of design with more details in next few days in
case we haven't started the development work yet.
> Work Preserving AM Restart for MapReduce
> ----------------------------------------
>
> Key: MAPREDUCE-6608
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Srikanth Sampath
> Assignee: Raju Bairishetti
> Attachments: WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in
> [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489]. We would like
> to take advantage of this for MapReduce(MR) applications. There are some
> challenges which have been described in the attached document and few options
> discussed. We solicit feedback from the community.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)