[ http://issues.apache.org/jira/browse/HADOOP-307?page=all ]

Sanjay Dahiya updated HADOOP-307:
---------------------------------

    Attachment: patch.txt

Small jobs benchmark - 

Generates input data and runs a number of small jobs repeatedly to estimate 
launch overhead for jobs. I am putting it in a separate directory from examples 
because we dont want this to be a part of hadoop distro on all nodes. Part of 
what we are benchmarking is transfering Mapper and Reducer jar files through 
HDFS. 

Quick usage instructions follows - 

Building the benchmark. 
benchmarks/build.xml depends on relative path of hadoop libs, if moving this 
directory, point the right HADOOP_HOME/lib in build.xml. 
to build - 
$ cd benchmarks 
$ ant

Running the benchmark
$ cd benchmarks
$ bin/run.sh

after successfully running the benchmark see logs/report.txt for consolidated 
output of all the runs. 

change this script to configure options. 

Configurable options are - 

-inputLines noOfLines 
  no of lines of input to generate. 

-inputType (ascending, descending, random)
  type of input to generate. 

-jar jarFilePath 
  Jar file containing Mapper and Reducer implementations in jar file. By 
default ant build creates MRBenchmark.jar file containing default Mapper and 
Reducer. 
  
-times numJobs 
No of times to run each MapReduce task, time is calculated as average of all 
runs. 

-workDir dfsPath 
DFS path to put output of MR tasks. 

-maps numMaps 
No of maps for wach task 

-reduces numReduces 
No of reduces for each task

-ignoreOutput
Doesn't copy the output back to local disk. Otherwise it creates the output 
back to a temp location on local disk. 

> Many small jobs benchmark for MapReduce
> ---------------------------------------
>
>          Key: HADOOP-307
>          URL: http://issues.apache.org/jira/browse/HADOOP-307
>      Project: Hadoop
>         Type: Task

>   Components: mapred
>     Reporter: Sanjay Dahiya
>     Priority: Minor
>  Attachments: patch.txt
>
> A benchmark that runs many small MapReduce tasks in sequence. A single map 
> reduce implementation is used, it is invoked multiple times with input as the 
> output from previous run. The input to first Map is a TextInputFormat ( a 
> text file with few hundred KBs). Input records are passed to output without 
> much processing. The idea is to benchmark the time taken by initialization of 
> Mapper and Reducer. An initial prototyping on a single machine with 20 MR 
> tasks in sequence took ~47 seconds per task. Looking for suggestions on what 
> else can be included in the benchmark. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to