[ http://issues.apache.org/jira/browse/HADOOP-307?page=all ]
Sanjay Dahiya updated HADOOP-307:
---------------------------------
Attachment: patch.txt
Patch for classpath issues. The benchmark can now be run using hadoop script
without having to set any extra classpath - $HADOOP_HOME/bin/hadoop jar <path
to MRBenchmark.jar> smallJobsBenchmark <options .. >. See Readme.txt for an
example of options.
bin/run.sh script can be used as an optional helper script if benchmark needs
to be run multiple times with different input configurations.
thanks Uros for pointing this out.
> Many small jobs benchmark for MapReduce
> ---------------------------------------
>
> Key: HADOOP-307
> URL: http://issues.apache.org/jira/browse/HADOOP-307
> Project: Hadoop
> Issue Type: Task
> Components: mapred
> Reporter: Sanjay Dahiya
> Assigned To: Sanjay Dahiya
> Priority: Minor
> Fix For: 0.5.0
>
> Attachments: patch.txt, patch.txt, patch.txt
>
>
> A benchmark that runs many small MapReduce tasks in sequence. A single map
> reduce implementation is used, it is invoked multiple times with input as the
> output from previous run. The input to first Map is a TextInputFormat ( a
> text file with few hundred KBs). Input records are passed to output without
> much processing. The idea is to benchmark the time taken by initialization of
> Mapper and Reducer. An initial prototyping on a single machine with 20 MR
> tasks in sequence took ~47 seconds per task. Looking for suggestions on what
> else can be included in the benchmark.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira