[ http://issues.apache.org/jira/browse/HADOOP-307?page=comments#action_12426751 ] Sanjay Dahiya commented on HADOOP-307: --------------------------------------
ok, I will create a new issue for this. > Many small jobs benchmark for MapReduce > --------------------------------------- > > Key: HADOOP-307 > URL: http://issues.apache.org/jira/browse/HADOOP-307 > Project: Hadoop > Issue Type: Task > Components: mapred > Reporter: Sanjay Dahiya > Assigned To: Sanjay Dahiya > Priority: Minor > Fix For: 0.5.0 > > Attachments: patch.txt, patch.txt, patch.txt > > > A benchmark that runs many small MapReduce tasks in sequence. A single map > reduce implementation is used, it is invoked multiple times with input as the > output from previous run. The input to first Map is a TextInputFormat ( a > text file with few hundred KBs). Input records are passed to output without > much processing. The idea is to benchmark the time taken by initialization of > Mapper and Reducer. An initial prototyping on a single machine with 20 MR > tasks in sequence took ~47 seconds per task. Looking for suggestions on what > else can be included in the benchmark. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
