[ http://issues.apache.org/jira/browse/HADOOP-307?page=all ]
Sanjay Dahiya updated HADOOP-307:
---------------------------------
Attachment: patch.txt
Small jobs benchmark -
Generates input data and runs a number of small jobs repeatedly to estimate
launch overhead for jobs. I am putting it in a separate directory from examples
because we dont want this to be a part of hadoop distro on all nodes. Part of
what we are benchmarking is transfering Mapper and Reducer jar files through
HDFS.
Quick usage instructions follows -
Building the benchmark.
benchmarks/build.xml depends on relative path of hadoop libs, if moving this
directory, point the right HADOOP_HOME/lib in build.xml.
to build -
$ cd benchmarks
$ ant
Running the benchmark
$ cd benchmarks
$ bin/run.sh
after successfully running the benchmark see logs/report.txt for consolidated
output of all the runs.
change this script to configure options.
Configurable options are -
-inputLines noOfLines
no of lines of input to generate.
-inputType (ascending, descending, random)
type of input to generate.
-jar jarFilePath
Jar file containing Mapper and Reducer implementations in jar file. By
default ant build creates MRBenchmark.jar file containing default Mapper and
Reducer.
-times numJobs
No of times to run each MapReduce task, time is calculated as average of all
runs.
-workDir dfsPath
DFS path to put output of MR tasks.
-maps numMaps
No of maps for wach task
-reduces numReduces
No of reduces for each task
-ignoreOutput
Doesn't copy the output back to local disk. Otherwise it creates the output
back to a temp location on local disk.
> Many small jobs benchmark for MapReduce
> ---------------------------------------
>
> Key: HADOOP-307
> URL: http://issues.apache.org/jira/browse/HADOOP-307
> Project: Hadoop
> Type: Task
> Components: mapred
> Reporter: Sanjay Dahiya
> Priority: Minor
> Attachments: patch.txt
>
> A benchmark that runs many small MapReduce tasks in sequence. A single map
> reduce implementation is used, it is invoked multiple times with input as the
> output from previous run. The input to first Map is a TextInputFormat ( a
> text file with few hundred KBs). Input records are passed to output without
> much processing. The idea is to benchmark the time taken by initialization of
> Mapper and Reducer. An initial prototyping on a single machine with 20 MR
> tasks in sequence took ~47 seconds per task. Looking for suggestions on what
> else can be included in the benchmark.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira