[jira] Updated: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner

Greg Roelofs (JIRA) Tue, 08 Mar 2011 15:07:23 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Greg Roelofs updated MAPREDUCE-1220:
------------------------------------

    Attachment: MR-1220.v1b.sshot-03-jobdetailshistory.jsp.png

MR-1220.v1b.sshot-03-jobdetailshistory.jsp.png

screenshot of the jobdetailshistory page (uber-job complete)

This was taken before the setup and cleanup tasks were moved inside UberTask. 
The UI didn't get updated for that, IIRC, so I think it shows lots of zeros on 
the top and bottom lines now. (Another TODO item...)

Note also the wrappable column heading, courtesy of "<wbr>" pseudo-tags. 
(Browsers optionally break there, but cut-and-paste doesn't pick up a spurious 
space. Highly recommended for other fat table cells such as counter names and 
types, job names, hostnames, etc. I should file a separate JIRA...)

> Implement an in-cluster LocalJobRunner
> --------------------------------------
>
>                 Key: MAPREDUCE-1220
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1220
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: client, jobtracker
>            Reporter: Arun C Murthy
>            Assignee: Greg Roelofs
>         Attachments: MAPREDUCE-1220_yhadoop20.patch, 
> MR-1220.v1.trunk-hadoop-common.Progress-dumper.patch.txt, 
> MR-1220.v10e-v11c-v12b.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v13.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v14b.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v15.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v1b.sshot-02-jobdetails.jsp.png, 
> MR-1220.v1b.sshot-03-jobdetailshistory.jsp.png, 
> MR-1220.v2.trunk-hadoop-mapreduce.patch.txt, 
> MR-1220.v2.trunk-hadoop-mapreduce.patch.txt, 
> MR-1220.v2b.sshot-01-jobtracker.jsp.png, 
> MR-1220.v6.ytrunk-hadoop-mapreduce.patch.txt, 
> MR-1220.v7.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v8b.ytrunk-hadoop-mapreduce.delta.patch.txt, 
> MR-1220.v9c.ytrunk-hadoop-mapreduce.delta.patch.txt
>
>
> Currently very small map-reduce jobs suffer from latency issues due to 
> overheads in Hadoop Map-Reduce such as scheduling, jvm startup etc. We've 
> periodically tried to optimize all parts of framework to achieve lower 
> latencies.
> I'd like to turn the problem around a little bit. I propose we allow very 
> small jobs to run as a single task job with multiple maps and reduces i.e. 
> similar to our current implementation of the LocalJobRunner. Thus, under 
> certain conditions (maybe user-set configuration, or if input data is small 
> i.e. less a DFS blocksize) we could launch a special task which will run all 
> maps in a serial manner, followed by the reduces. This would really help 
> small jobs achieve significantly smaller latencies, thanks to lesser 
> scheduling overhead, jvm startup, lack of shuffle over the network etc. 
> This would be a huge benefit, especially on large clusters, to small Hive/Pig 
> queries.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner

Reply via email to