[
https://issues.apache.org/jira/browse/MAPREDUCE-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072045#comment-13072045
]
Jonathan Eagles commented on MAPREDUCE-1060:
--------------------------------------------
Setup 1 - Run a modified word count program with failure induction built into
the JobTracker. Restart the cluster everytime since the failure only works for
the first mapreduce job run.
// See the failure
1. Apply the MAPREDUCE-1060-branch-0.20-security.manualtest.patch
(build/install/start cluster)
2. Run hadoop jar hadoop-examples-0.20.206.0-SNAPSHOT.jar wordcount
-Dmapred.reduce.tasks=1 /pride.txt /wc4 180000 (where pride.txt is the text of
pride and prejudice from project gutenberg at
http://www.gutenberg.org/ebooks/1342.txt.utf8) (note the last parameter
instructs the restarted map task to sleep for 3 minutes so as to keep running
for a long time after the reduces finish)
Verify the restarted map task runs a long time after the reduces have finished
and that the original map task is marked as killed.
Setup 2 - Rerun the failure setup above with the supplied fix to show the patch
fixes the issue
1. Apply the fix for the issue MAPREDUCE-1060-branch-0.20-security.patch
(build/install/restart cluster)
2. Run hadoop jar hadoop-examples-0.20.206.0-SNAPSHOT.jar wordcount
-Dmapred.reduce.tasks=1 /pride.txt /wc4 180000 (where pride.txt is the text of
pride and prejudice from project gutenberg at
http://www.gutenberg.org/ebooks/1342.txt.utf8) (note the last parameter
instructs the restarted map task to sleep for 3 minutes so as to keep running
for a long time after the reduces finish)
Verify the restarted map task is killed shortly after the reduces have finished
and that the original map task is marked as succeeded
> JT should kill running maps when all the reducers have completed
> ----------------------------------------------------------------
>
> Key: MAPREDUCE-1060
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1060
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Jothi Padmanabhan
> Assignee: Jonathan Eagles
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-1060-branch-0.20-security.manualtest.patch,
> MAPREDUCE-1060-branch-0.20-security.patch
>
>
> We have seen some situations where maps are still running when all the
> reducers have completed. This could happen because of lost TT's, interplay of
> speculative tasks with bad TT's etc. If the maps take a long time to run, it
> unnecessarily delays the job completion time, as this map output is not
> required anyways. The JT should possibly kill running maps when all the
> reducers have completed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira