[
http://issues.apache.org/jira/browse/HADOOP-307?page=comments#action_12418458 ]
Sanjay Dahiya commented on HADOOP-307:
--------------------------------------
Configuration options are -
input ( local file ),
output - DFS Path,
times ( no of times to execute the job )
jarFile ( Mapper & Reducer )
wordDir ( temp output from intermediate tasks)
maps ( num Maps)
reduces ( num Reduces)
I am not yet validating the bytes but I will add that. Also number of map and
reduce tasks can be configured, its passed to JobConf . The benchmark sets up
multiple MapReduce tasks in sequence and output of each job is passed as input
to next execution of same job). Its using a TextInputFormat by default and
thats not configurable yet.
I was sick and out so delay in response. I am yet to run on a cluster, by
tomorrow I should post the results.
> Many small jobs benchmark for MapReduce
> ---------------------------------------
>
> Key: HADOOP-307
> URL: http://issues.apache.org/jira/browse/HADOOP-307
> Project: Hadoop
> Type: Task
> Components: mapred
> Reporter: Sanjay Dahiya
> Priority: Minor
>
> A benchmark that runs many small MapReduce tasks in sequence. A single map
> reduce implementation is used, it is invoked multiple times with input as the
> output from previous run. The input to first Map is a TextInputFormat ( a
> text file with few hundred KBs). Input records are passed to output without
> much processing. The idea is to benchmark the time taken by initialization of
> Mapper and Reducer. An initial prototyping on a single machine with 20 MR
> tasks in sequence took ~47 seconds per task. Looking for suggestions on what
> else can be included in the benchmark.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira