[jira] Commented: (HADOOP-307) Many small jobs benchmark for MapReduce

Sanjay Dahiya (JIRA) Thu, 29 Jun 2006 05:47:08 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-307?page=comments#action_12418458 ]


Sanjay Dahiya commented on HADOOP-307:
--------------------------------------

Configuration options are - 
    input ( local file ), 
    output - DFS Path, 
    times ( no of times to execute the job ) 
    jarFile ( Mapper & Reducer )
    wordDir ( temp output from intermediate tasks) 
    maps ( num Maps)
    reduces ( num Reduces) 
    
I am not yet validating the bytes but I will add that. Also number of map and 
reduce tasks can be configured, its passed to JobConf . The benchmark sets up 
multiple MapReduce tasks in sequence and output of each job is passed as input 
to next execution of same job). Its using a TextInputFormat by default and 
thats not configurable yet.

I was sick and out so delay in response. I am yet to run on a cluster, by 
tomorrow I should post the results. 

> Many small jobs benchmark for MapReduce
> ---------------------------------------
>
>          Key: HADOOP-307
>          URL: http://issues.apache.org/jira/browse/HADOOP-307
>      Project: Hadoop
>         Type: Task

>   Components: mapred
>     Reporter: Sanjay Dahiya
>     Priority: Minor

>
> A benchmark that runs many small MapReduce tasks in sequence. A single map 
> reduce implementation is used, it is invoked multiple times with input as the 
> output from previous run. The input to first Map is a TextInputFormat ( a 
> text file with few hundred KBs). Input records are passed to output without 
> much processing. The idea is to benchmark the time taken by initialization of 
> Mapper and Reducer. An initial prototyping on a single machine with 20 MR 
> tasks in sequence took ~47 seconds per task. Looking for suggestions on what 
> else can be included in the benchmark. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-307) Many small jobs benchmark for MapReduce

Reply via email to