[
https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136279#comment-15136279
]
Srikanth Sampath commented on MAPREDUCE-6608:
---------------------------------------------
I have attached a design patch -
[Patch1|https://issues.apache.org/jira/secure/attachment/12786705/Patch1.patch]
that gives a high level approach on the implementation. The
[Design|https://issues.apache.org/jira/secure/attachment/12786706/WorkPreservingMRAppMaster-2.pdf]
document gives the high level design.
*Notes:*
1. This is a patch against Apache 2.6.1
2. It works for the example hadoop sleep job - where I have killed the AM
randomly and the inflight tasks continue.
3. SS_DEBUG in the patch indicates a debug statement that helps me. Some of
these will be removed eventually.
4. SS_FIXME in the patch is a tag for me to fix some known issues that I have
commented on. I will clean these up before the next submission.
I solicit comments on the high level design and the approach I have taken in
the patch.
*Next Steps:*
1. I will iron out the known issues (all SS_FIXME), clean up the interfaces,
make the code compliant with apache coding standards, rebase the code against
trunk, and test it thoroughly. I will factor in the comments and suggestions
that are made with the design doc and design patch.
2. Identify the components and issues involved and raise sub tasks.
> Work Preserving AM Restart for MapReduce
> ----------------------------------------
>
> Key: MAPREDUCE-6608
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Srikanth Sampath
> Assignee: Srikanth Sampath
> Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf,
> WorkPreservingMRAppMaster-2.pdf, WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in
> [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489]. We would like
> to take advantage of this for MapReduce(MR) applications. There are some
> challenges which have been described in the attached document and few options
> discussed. We solicit feedback from the community.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)