down the road, maybe we should change the build so that examples are
not distributed by default, but are instead built as standalone
jars. Does this make sense?
On Jul 12, 2006, at 5:38 AM, Sanjay Dahiya (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-307?page=all ]
Sanjay Dahiya updated HADOOP-307:
---------------------------------
Attachment: patch.txt
The only reason to keep it separate is we dont want these jar files
already in classpath on all nodes. Part of the benchmark's goal is
to estimate the overhead in transfering the jar file through HDFS.
Also there is bin dir in this for scripts to run the benchmark. If
this doesnt conflict with existing examples the we can put it there
as well.
Updating the patch, it now generates excel friendly CSV output to
plot graphs etc.
Many small jobs benchmark for MapReduce
---------------------------------------
Key: HADOOP-307
URL: http://issues.apache.org/jira/browse/HADOOP-307
Project: Hadoop
Type: Task
Components: mapred
Reporter: Sanjay Dahiya
Priority: Minor
Attachments: patch.txt
A benchmark that runs many small MapReduce tasks in sequence. A
single map reduce implementation is used, it is invoked multiple
times with input as the output from previous run. The input to
first Map is a TextInputFormat ( a text file with few hundred
KBs). Input records are passed to output without much processing.
The idea is to benchmark the time taken by initialization of
Mapper and Reducer. An initial prototyping on a single machine
with 20 MR tasks in sequence took ~47 seconds per task. Looking
for suggestions on what else can be included in the benchmark.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira