[ http://issues.apache.org/jira/browse/HADOOP-307?page=all ]
Sanjay Dahiya updated HADOOP-307:
---------------------------------
Attachment: patch.txt
Updated to reside in src/contrib. with build.xml changes for contrib.
> Many small jobs benchmark for MapReduce
> ---------------------------------------
>
> Key: HADOOP-307
> URL: http://issues.apache.org/jira/browse/HADOOP-307
> Project: Hadoop
> Issue Type: Task
> Components: mapred
> Reporter: Sanjay Dahiya
> Priority: Minor
> Attachments: patch.txt, patch.txt
>
>
> A benchmark that runs many small MapReduce tasks in sequence. A single map
> reduce implementation is used, it is invoked multiple times with input as the
> output from previous run. The input to first Map is a TextInputFormat ( a
> text file with few hundred KBs). Input records are passed to output without
> much processing. The idea is to benchmark the time taken by initialization of
> Mapper and Reducer. An initial prototyping on a single machine with 20 MR
> tasks in sequence took ~47 seconds per task. Looking for suggestions on what
> else can be included in the benchmark.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira