[jira] Commented: (HADOOP-307) Many small jobs benchmark for MapReduce

Doug Cutting (JIRA) Wed, 12 Jul 2006 02:07:28 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-307?page=comments#action_12420562 ]


Doug Cutting commented on HADOOP-307:
-------------------------------------

We already have some benchmarks in the examples source tree.  Any reason not to 
put this there too?  That way it would be compiled by the "test" task, which 
means it will be more likely to be maintained, it would't need it's own 
build.xml, etc.  Also, we might name this something more informative, like, 
MRJobBenchmark or something.

> Many small jobs benchmark for MapReduce
> ---------------------------------------
>
>          Key: HADOOP-307
>          URL: http://issues.apache.org/jira/browse/HADOOP-307
>      Project: Hadoop
>         Type: Task

>   Components: mapred
>     Reporter: Sanjay Dahiya
>     Priority: Minor
>  Attachments: patch.txt
>
> A benchmark that runs many small MapReduce tasks in sequence. A single map 
> reduce implementation is used, it is invoked multiple times with input as the 
> output from previous run. The input to first Map is a TextInputFormat ( a 
> text file with few hundred KBs). Input records are passed to output without 
> much processing. The idea is to benchmark the time taken by initialization of 
> Mapper and Reducer. An initial prototyping on a single machine with 20 MR 
> tasks in sequence took ~47 seconds per task. Looking for suggestions on what 
> else can be included in the benchmark. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-307) Many small jobs benchmark for MapReduce

Reply via email to