[jira] Updated: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner

Greg Roelofs (JIRA) Tue, 08 Mar 2011 14:59:27 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Greg Roelofs updated MAPREDUCE-1220:
------------------------------------

    Attachment: MR-1220.v2b.sshot-01-jobtracker.jsp.png

screenshot of top-level (multi-job) JobTracker page

Main addition is the UberTask details under the "Job Scheduling Information" 
column at far right.  The uber stuff gets appended if there's anything else 
there (as is the case with the capacity scheduler).

Colors: pale yellow for running jobs; pale pink for failed/killed jobs; pale 
green for successful jobs. (Not uber-specific, but trivial and in the same 
place as some of the other changes.)

> Implement an in-cluster LocalJobRunner
> --------------------------------------
>
>                 Key: MAPREDUCE-1220
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1220
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: client, jobtracker
>            Reporter: Arun C Murthy
>            Assignee: Greg Roelofs
>         Attachments: MAPREDUCE-1220_yhadoop20.patch, 
> MR-1220.v1.trunk-hadoop-common.Progress-dumper.patch.txt, 
> MR-1220.v10e-v11c-v12b.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v13.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v14b.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v15.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v2.trunk-hadoop-mapreduce.patch.txt, 
> MR-1220.v2.trunk-hadoop-mapreduce.patch.txt, 
> MR-1220.v2b.sshot-01-jobtracker.jsp.png, 
> MR-1220.v6.ytrunk-hadoop-mapreduce.patch.txt, 
> MR-1220.v7.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v8b.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v9c.ytrunk-hadoop-mapreduce.delta.patch.txt
>
>
> Currently very small map-reduce jobs suffer from latency issues due to 
> overheads in Hadoop Map-Reduce such as scheduling, jvm startup etc. We've 
> periodically tried to optimize all parts of framework to achieve lower 
> latencies.
> I'd like to turn the problem around a little bit. I propose we allow very 
> small jobs to run as a single task job with multiple maps and reduces i.e. 
> similar to our current implementation of the LocalJobRunner. Thus, under 
> certain conditions (maybe user-set configuration, or if input data is small 
> i.e. less a DFS blocksize) we could launch a special task which will run all 
> maps in a serial manner, followed by the reduces. This would really help 
> small jobs achieve significantly smaller latencies, thanks to lesser 
> scheduling overhead, jvm startup, lack of shuffle over the network etc. 
> This would be a huge benefit, especially on large clusters, to small Hive/Pig 
> queries.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner

Reply via email to