Re: [jira] Updated: (HADOOP-307) Many small jobs benchmark for MapReduce

Eric Baldeschwieler Fri, 14 Jul 2006 15:36:10 -0700

down the road, maybe we should change the build so that examples arenot distributed by default, but are instead built as standalonejars. Does this make sense?


On Jul 12, 2006, at 5:38 AM, Sanjay Dahiya (JIRA) wrote:

     [ http://issues.apache.org/jira/browse/HADOOP-307?page=all ]

Sanjay Dahiya updated HADOOP-307:
---------------------------------

    Attachment: patch.txt
The only reason to keep it separate is we dont want these jar filesalready in classpath on all nodes. Part of the benchmark's goal isto estimate the overhead in transfering the jar file through HDFS.Also there is bin dir in this for scripts to run the benchmark. Ifthis doesnt conflict with existing examples the we can put it thereas well.
Updating the patch, it now generates excel friendly CSV output toplot graphs etc.
Many small jobs benchmark for MapReduce
---------------------------------------

         Key: HADOOP-307
         URL: http://issues.apache.org/jira/browse/HADOOP-307
     Project: Hadoop
        Type: Task
  Components: mapred
    Reporter: Sanjay Dahiya
    Priority: Minor
 Attachments: patch.txt
A benchmark that runs many small MapReduce tasks in sequence. Asingle map reduce implementation is used, it is invoked multipletimes with input as the output from previous run. The input tofirst Map is a TextInputFormat ( a text file with few hundredKBs). Input records are passed to output without much processing.The idea is to benchmark the time taken by initialization ofMapper and Reducer. An initial prototyping on a single machinewith 20 MR tasks in sequence took ~47 seconds per task. Lookingfor suggestions on what else can be included in the benchmark.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Updated: (HADOOP-307) Many small jobs benchmark for MapReduce

Reply via email to