Author: cdouglas
Date: Mon Dec 1 23:03:09 2008
New Revision: 722391
URL: http://svn.apache.org/viewvc?rev=722391&view=rev
Log:
HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. Contributed
by Runping Qi.
Added:
hadoop/core/trunk/src/benchmarks/gridmix2/
hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2
hadoop/core/trunk/src/benchmarks/gridmix2/build.xml
hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh
hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2
hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml
hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2
hadoop/core/trunk/src/benchmarks/gridmix2/src/
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixRunner.java
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/SortJobCreator.java
Modified:
hadoop/core/trunk/CHANGES.txt
hadoop/core/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
Modified: hadoop/core/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=722391&r1=722390&r2=722391&view=diff
==============================================================================
--- hadoop/core/trunk/CHANGES.txt (original)
+++ hadoop/core/trunk/CHANGES.txt Mon Dec 1 23:03:09 2008
@@ -149,6 +149,9 @@
HADOOP-4649. Improve abstraction for spill indices. (cdouglas)
+ HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. (Runping
+ Qi via cdouglas)
+
OPTIMIZATIONS
HADOOP-3293. Fixes FileInputFormat to do provide locations for splits
Added: hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2 (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2 Mon Dec 1
23:03:09 2008
@@ -0,0 +1,136 @@
+### "Gridmix" Benchmark ###
+
+Contents:
+
+0 Overview
+1 Getting Started
+ 1.0 Build
+ 1.1 Configure
+ 1.2 Generate test data
+2 Running
+ 2.0 General
+ 2.1 Non-Hod cluster
+ 2.2 Hod
+ 2.2.0 Static cluster
+ 2.2.1 Hod cluster
+
+
+* 0 Overview
+
+The scripts in this package model a cluster workload. The workload is
+simulated by generating random data and submitting map/reduce jobs that
+mimic observed data-access patterns in user jobs. The full benchmark
+generates approximately 2.5TB of (often compressed) input data operated on
+by the following simulated jobs:
+
+1) Three stage map/reduce job
+ Input: 500GB compressed (2TB uncompressed) SequenceFile
+ (k,v) = (5 words, 100 words)
+ hadoop-env: FIXCOMPSEQ
+ Compute1: keep 10% map, 40% reduce
+ Compute2: keep 100% map, 77% reduce
+ Input from Compute1
+ Compute3: keep 116% map, 91% reduce
+ Input from Compute2
+ Motivation: Many user workloads are implemented as pipelined map/reduce
+ jobs, including Pig workloads
+
+2) Large sort of variable key/value size
+ Input: 500GB compressed (2TB uncompressed) SequenceFile
+ (k,v) = (5-10 words, 100-10000 words)
+ hadoop-env: VARCOMPSEQ
+ Compute: keep 100% map, 100% reduce
+ Motivation: Processing large, compressed datsets is common.
+
+3) Reference select
+ Input: 500GB compressed (2TB uncompressed) SequenceFile
+ (k,v) = (5-10 words, 100-10000 words)
+ hadoop-env: VARCOMPSEQ
+ Compute: keep 0.2% map, 5% reduce
+ 1 Reducer
+ Motivation: Sampling from a large, reference dataset is common.
+
+4) API text sort (java, streaming)
+ Input: 500GB uncompressed Text
+ (k,v) = (1-10 words, 0-200 words)
+ hadoop-env: VARINFLTEXT
+ Compute: keep 100% map, 100% reduce
+ Motivation: This benchmark should exercise each of the APIs to
+ map/reduce
+
+5) Jobs with combiner (word count jobs)
+
+A benchmark load is a mix of different numbers of small, medium, and large
jobs of the above types.
+The exact mix is specified in an xml file (gridmix_config.xml). We have a Java
program to
+construct those jobs based on the xml file and put them under the control of a
JobControl object.
+The JobControl object then submitts the jobs to the cluster and monitors their
progress until all jobs complete.
+
+
+Notes(1-3): Since input data are compressed, this means that each mapper
+outputs a lot more bytes than it reads in, typically causing map output
+spills.
+
+
+
+* 1 Getting Started
+
+1.0 Build
+
+In the src/benchmarks/gridmix dir, type "ant".
+gridmix.jar will be created in the build subdir.
+copy gridmix.jar to gridmix dir.
+
+1.1 Configure environment variables
+
+One must modify gridmix-env-2 to set the following variables:
+
+HADOOP_HOME The hadoop install location
+HADOOP_VERSION The exact hadoop version to be used. e.g. hadoop-0.18.2-dev
+HADOOP_CONF_DIR The dir containing the hadoop-site.xml for teh cluster to be
used.
+USE_REAL_DATA A large data-set will be created and used by the benchmark if
it is set to true.
+
+
+1.2 Configure the job mixture
+
+A default gridmix_conf.xml file is provided.
+One may make appropriate changes as necessary on the number of jobs of various
types
+and sizes. One can also change the number of reducers of each jobs, and
specify whether
+to compress the output data of a map/reduce job.
+Note that one can specify multiple numbers of in the
+numOfJobs field and numOfReduces field, like:
+<property>
+ <name>javaSort.smallJobs.numOfJobs</name>
+ <value>8,2</value>
+ <description></description>
+</property>
+
+
+<property>
+ <name>javaSort.smallJobs.numOfReduces</name>
+ <value>15,70</value>
+ <description></description>
+</property>
+
+The above spec means that we will have 8 small java sort jobs with 15 reducers
and 2 small java sort
+jobs with 17 reducers.
+
+1.3 Generate test data
+
+Test data is generated using the generateGridmix2Data.sh script.
+ ./generateGridmix2Data.sh
+One may modify the structure and size of the data generated here.
+
+It is sufficient to run the script without modification, though it may
+require up to 4TB of free space in the default filesystem. Changing the size
+of the input data (COMPRESSED_DATA_BYTES, UNCOMPRESSED_DATA_BYTES,
+INDIRECT_DATA_BYTES) is safe. A 4x compression ratio for generated, block
+compressed data is typical.
+
+* 2 Running
+
+You need to set HADOOP_CONF_DIR to the right directory where hadoop-site.xml
exists.
+Then you just need to type
+ ./rungridmix_2
+It will create start.out to record the start time, and at the end, it will
create end.out to record the
+endi time.
+
Added: hadoop/core/trunk/src/benchmarks/gridmix2/build.xml
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/build.xml?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/build.xml (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/build.xml Mon Dec 1 23:03:09 2008
@@ -0,0 +1,67 @@
+<?xml version="1.0" ?>
+<project default="main" basedir=".">
+ <property name="Name" value="gridmix"/>
+ <property name="version" value="0.1"/>
+ <property name="final.name" value="${name}-${version}"/>
+ <property name="year" value="2008"/>
+ <property name="hadoop.dir" value="${basedir}/../../../"/>
+ <property name="lib.dir" value="${hadoop.dir}/lib"/>
+ <property name="src.dir" value="${basedir}/src"/>
+ <property name="conf.dir" value="${basedir}/conf"/>
+ <property name="docs.dir" value="${basedir}/docs"/>
+ <property name="build.dir" value="${basedir}/build"/>
+ <property name="dist.dir" value="${basedir}/dist"/>
+ <property name="build.classes" value="${build.dir}/classes"/>
+
+ <target name="init">
+ <mkdir dir="${build.dir}"/>
+ <mkdir dir="${dist.dir}"/>
+ </target>
+
+ <target name="main" depends="init, compile, compress" description="Main
target">
+ <echo>
+ Building the .jar files.
+ </echo>
+ </target>
+
+ <target name="compile" depends="init" description="Compilation target">
+ <javac srcdir="src/java/" destdir="${build.dir}">
+ <classpath refid="classpath" />
+ </javac>
+ </target>
+
+
+ <target name="compress" depends="compile" description="Compression
target">
+ <jar jarfile="${build.dir}/gridmix.jar" basedir="${build.dir}"
includes="**/*.class" />
+
+
+ <copy todir="." includeEmptyDirs="false">
+ <fileset dir="${build.dir}">
+ <exclude name="**" />
+ <include name="**/*.jar" />
+ </fileset>
+ </copy>
+ </target>
+
+
+ <!-- ================================================================== -->
+ <!-- Clean. Delete the build files, and their directories -->
+ <!-- ================================================================== -->
+ <target name="clean" description="Clean. Delete the build files, and
their directories">
+ <delete dir="${build.dir}"/>
+ <delete dir="${dist.dir}"/>
+ </target>
+
+ <!-- the normal classpath -->
+ <path id="classpath">
+ <pathelement location="${build.classes}"/>
+ <fileset dir="${lib.dir}">
+ <include name="*.jar" />
+ <exclude name="**/excluded/" />
+ </fileset>
+ <fileset dir="${hadoop.dir}/build">
+ <include name="**.jar" />
+ <include name="contrib/streaming/*.jar" />
+ </fileset>
+ </path>
+</project>
Added: hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh Mon Dec
1 23:03:09 2008
@@ -0,0 +1,94 @@
+#!/usr/bin/env bash
+
+##############################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#####################################################################
+
+GRID_DIR=`dirname "$0"`
+GRID_DIR=`cd "$GRID_DIR"; pwd`
+source $GRID_DIR/gridmix-env-2
+
+# Smaller data set is used by default.
+COMPRESSED_DATA_BYTES=2147483648
+UNCOMPRESSED_DATA_BYTES=536870912
+
+# Number of partitions for output data
+NUM_MAPS=100
+
+# If the env var USE_REAL_DATASET is set, then use the params to generate the
bigger (real) dataset.
+if [ ! -z ${USE_REAL_DATASET} ] ; then
+ echo "Using real dataset"
+ NUM_MAPS=492
+ # 2TB data compressing to approx 500GB
+ COMPRESSED_DATA_BYTES=2147483648000
+ # 500GB
+ UNCOMPRESSED_DATA_BYTES=536870912000
+fi
+
+## Data sources
+export GRID_MIX_DATA=/gridmix/data
+# Variable length key, value compressed SequenceFile
+export VARCOMPSEQ=${GRID_MIX_DATA}/WebSimulationBlockCompressed
+# Fixed length key, value compressed SequenceFile
+export FIXCOMPSEQ=${GRID_MIX_DATA}/MonsterQueryBlockCompressed
+# Variable length key, value uncompressed Text File
+export VARINFLTEXT=${GRID_MIX_DATA}/SortUncompressed
+# Fixed length key, value compressed Text File
+export FIXCOMPTEXT=${GRID_MIX_DATA}/EntropySimulationCompressed
+
+${HADOOP_HOME}/bin/hadoop jar \
+ ${EXAMPLE_JAR} randomtextwriter \
+ -D test.randomtextwrite.total_bytes=${COMPRESSED_DATA_BYTES} \
+ -D test.randomtextwrite.bytes_per_map=$((${COMPRESSED_DATA_BYTES} /
${NUM_MAPS})) \
+ -D test.randomtextwrite.min_words_key=5 \
+ -D test.randomtextwrite.max_words_key=10 \
+ -D test.randomtextwrite.min_words_value=100 \
+ -D test.randomtextwrite.max_words_value=10000 \
+ -D mapred.output.compress=true \
+ -D mapred.map.output.compression.type=BLOCK \
+ -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \
+ ${VARCOMPSEQ} &
+
+
+${HADOOP_HOME}/bin/hadoop jar \
+ ${EXAMPLE_JAR} randomtextwriter \
+ -D test.randomtextwrite.total_bytes=${COMPRESSED_DATA_BYTES} \
+ -D test.randomtextwrite.bytes_per_map=$((${COMPRESSED_DATA_BYTES} /
${NUM_MAPS})) \
+ -D test.randomtextwrite.min_words_key=5 \
+ -D test.randomtextwrite.max_words_key=5 \
+ -D test.randomtextwrite.min_words_value=100 \
+ -D test.randomtextwrite.max_words_value=100 \
+ -D mapred.output.compress=true \
+ -D mapred.map.output.compression.type=BLOCK \
+ -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \
+ ${FIXCOMPSEQ} &
+
+
+${HADOOP_HOME}/bin/hadoop jar \
+ ${EXAMPLE_JAR} randomtextwriter \
+ -D test.randomtextwrite.total_bytes=${UNCOMPRESSED_DATA_BYTES} \
+ -D test.randomtextwrite.bytes_per_map=$((${UNCOMPRESSED_DATA_BYTES} /
${NUM_MAPS})) \
+ -D test.randomtextwrite.min_words_key=1 \
+ -D test.randomtextwrite.max_words_key=10 \
+ -D test.randomtextwrite.min_words_value=0 \
+ -D test.randomtextwrite.max_words_value=200 \
+ -D mapred.output.compress=false \
+ -outFormat org.apache.hadoop.mapred.TextOutputFormat \
+ ${VARINFLTEXT} &
+
+
Added: hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2 (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2 Mon Dec 1 23:03:09
2008
@@ -0,0 +1,35 @@
+#!/usr/bin/env bash
+
+##############################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#####################################################################
+
+
+## Environment configuration
+# Hadoop installation
+export HADOOP_VERSION=hadoop-0.18.2-dev
+export HADOOP_HOME=${HADOOP_INSTALL_HOME}/${HADOOP_VERSION}
+export HADOOP_CONF_DIR=
+export USE_REAL_DATASET=TRUE
+
+export APP_JAR=${HADOOP_HOME}/${HADOOP_VERSION}-test.jar
+export EXAMPLE_JAR=${HADOOP_HOME}/${HADOOP_VERSION}-examples.jar
+export
STREAMING_JAR=${HADOOP_HOME}/contrib/streaming/${HADOOP_VERSION}-streaming.jar
+
+
+
Added: hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml Mon Dec 1
23:03:09 2008
@@ -0,0 +1,550 @@
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>
+
+<!-- Put site-specific property overrides in this file. -->
+
+<configuration>
+
+
+<property>
+ <name>GRID_MIX_DATA</name>
+ <value>/gridmix/data</value>
+ <description></description>
+</property>
+
+<property>
+ <name>FIXCOMPTEXT</name>
+ <value>${GRID_MIX_DATA}/EntropySimulationCompressed</value>
+ <description></description>
+</property>
+
+<property>
+ <name>VARINFLTEXT</name>
+ <value>${GRID_MIX_DATA}/SortUncompressed</value>
+ <description></description>
+</property>
+
+<property>
+ <name>FIXCOMPSEQ</name>
+ <value>${GRID_MIX_DATA}/MonsterQueryBlockCompressed</value>
+ <description></description>
+</property>
+
+<property>
+ <name>VARCOMPSEQ</name>
+ <value>${GRID_MIX_DATA}/WebSimulationBlockCompressed</value>
+ <description></description>
+</property>
+
+
+<property>
+ <name>streamSort.smallJobs.inputFiles</name>
+ <value>${VARINFLTEXT}/{part-00000,part-00001,part-00002}</value>
+ <description></description>
+</property>
+
+<property>
+ <name>streamSort.smallJobs.numOfJobs</name>
+ <value>40</value>
+ <description></description>
+</property>
+
+<property>
+ <name>streamSort.smallJobs.numOfReduces</name>
+ <value>15</value>
+ <description></description>
+</property>
+
+<property>
+ <name>streamSort.smallJobs.numOfMapoutputCompressed</name>
+ <value>40</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>streamSort.smallJobs.numOfOutputCompressed</name>
+ <value>20</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>streamSort.mediumJobs.numOfJobs</name>
+ <value>16</value>
+ <description></description>
+</property>
+<property>
+ <name>streamSort.mediumJobs.inputFiles</name>
+ <value>${VARINFLTEXT}/{part-000*0,part-000*1,part-000*2}</value>
+ <description></description>
+</property>
+<property>
+ <name>streamSort.mediumJobs.numOfReduces</name>
+ <value>170</value>
+ <description></description>
+</property>
+
+<property>
+ <name>streamSort.mediumJobs.numOfMapoutputCompressed</name>
+ <value>16</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>streamSort.mediumJobs.numOfOutputCompressed</name>
+ <value>12</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>streamSort.largeJobs.numOfJobs</name>
+ <value>5</value>
+ <description></description>
+</property>
+<property>
+ <name>streamSort.largeJobs.inputFiles</name>
+ <value>${VARINFLTEXT}</value>
+ <description></description>
+</property>
+<property>
+ <name>streamSort.largeJobs.numOfReduces</name>
+ <value>370</value>
+ <description></description>
+</property>
+
+<property>
+ <name>streamSort.largeJobs.numOfMapoutputCompressed</name>
+ <value>5</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>streamSort.largeJobs.numOfOutputCompressed</name>
+ <value>3</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>javaSort.smallJobs.numOfJobs</name>
+ <value>8,2</value>
+ <description></description>
+</property>
+<property>
+ <name>javaSort.smallJobs.inputFiles</name>
+ <value>${VARINFLTEXT}/{part-00000,part-00001,part-00002}</value>
+ <description></description>
+</property>
+<property>
+ <name>javaSort.smallJobs.numOfReduces</name>
+ <value>15,70</value>
+ <description></description>
+</property>
+
+<property>
+ <name>javaSort.smallJobs.numOfMapoutputCompressed</name>
+ <value>10</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>javaSort.smallJobs.numOfOutputCompressed</name>
+ <value>3</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>javaSort.mediumJobs.numOfJobs</name>
+ <value>4,2</value>
+ <description></description>
+</property>
+<property>
+ <name>javaSort.mediumJobs.inputFiles</name>
+ <value>${VARINFLTEXT}/{part-000*0,part-000*1,part-000*2}</value>
+ <description></description>
+</property>
+<property>
+ <name>javaSort.mediumJobs.numOfReduces</name>
+ <value>170,70</value>
+ <description></description>
+</property>
+
+<property>
+ <name>javaSort.mediumJobs.numOfMapoutputCompressed</name>
+ <value>6</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>javaSort.mediumJobs.numOfOutputCompressed</name>
+ <value>4</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>javaSort.largeJobs.numOfJobs</name>
+ <value>3</value>
+ <description></description>
+</property>
+<property>
+ <name>javaSort.largeJobs.inputFiles</name>
+ <value>${VARINFLTEXT}</value>
+ <description></description>
+</property>
+<property>
+ <name>javaSort.largeJobs.numOfReduces</name>
+ <value>370</value>
+ <description></description>
+</property>
+
+<property>
+ <name>javaSort.largeJobs.numOfMapoutputCompressed</name>
+ <value>3</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>javaSort.largeJobs.numOfOutputCompressed</name>
+ <value>2</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>combiner.smallJobs.numOfJobs</name>
+ <value>11,4</value>
+ <description></description>
+</property>
+<property>
+ <name>combiner.smallJobs.inputFiles</name>
+ <value>${VARINFLTEXT}/{part-00000,part-00001,part-00002}</value>
+ <description></description>
+</property>
+<property>
+ <name>combiner.smallJobs.numOfReduces</name>
+ <value>10,1</value>
+ <description></description>
+</property>
+
+<property>
+ <name>combiner.smallJobs.numOfMapoutputCompressed</name>
+ <value>15</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>combiner.smallJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>combiner.mediumJobs.numOfJobs</name>
+ <value>8</value>
+ <description></description>
+</property>
+<property>
+ <name>combiner.mediumJobs.inputFiles</name>
+ <value>${VARINFLTEXT}/{part-000*0,part-000*1,part-000*2}</value>
+ <description></description>
+</property>
+<property>
+ <name>combiner.mediumJobs.numOfReduces</name>
+ <value>100</value>
+ <description></description>
+</property>
+
+<property>
+ <name>combiner.mediumJobs.numOfMapoutputCompressed</name>
+ <value>8</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>combiner.mediumJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>combiner.largeJobs.numOfJobs</name>
+ <value>4</value>
+ <description></description>
+</property>
+<property>
+ <name>combiner.largeJobs.inputFiles</name>
+ <value>${VARINFLTEXT}</value>
+ <description></description>
+</property>
+<property>
+ <name>combiner.largeJobs.numOfReduces</name>
+ <value>360</value>
+ <description></description>
+</property>
+
+<property>
+ <name>combiner.largeJobs.numOfMapoutputCompressed</name>
+ <value>4</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>combiner.largeJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>monsterQuery.smallJobs.numOfJobs</name>
+ <value>7</value>
+ <description></description>
+</property>
+<property>
+ <name>monsterQuery.smallJobs.inputFiles</name>
+ <value>${FIXCOMPSEQ}/{part-00000,part-00001,part-00002}</value>
+ <description></description>
+</property>
+<property>
+ <name>monsterQuery.smallJobs.numOfReduces</name>
+ <value>5</value>
+ <description></description>
+</property>
+
+<property>
+ <name>monsterQuery.smallJobs.numOfMapoutputCompressed</name>
+ <value>7</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>monsterQuery.smallJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>monsterQuery.mediumJobs.numOfJobs</name>
+ <value>5</value>
+ <description></description>
+</property>
+<property>
+ <name>monsterQuery.mediumJobs.inputFiles</name>
+ <value>${FIXCOMPSEQ}/{part-000*0,part-000*1,part-000*2}</value>
+ <description></description>
+</property>
+<property>
+ <name>monsterQuery.mediumJobs.numOfReduces</name>
+ <value>100</value>
+ <description></description>
+</property>
+
+<property>
+ <name>monsterQuery.mediumJobs.numOfMapoutputCompressed</name>
+ <value>5</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>monsterQuery.mediumJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>monsterQuery.largeJobs.numOfJobs</name>
+ <value>3</value>
+ <description></description>
+</property>
+<property>
+ <name>monsterQuery.largeJobs.inputFiles</name>
+ <value>${FIXCOMPSEQ}</value>
+ <description></description>
+</property>
+<property>
+ <name>monsterQuery.largeJobs.numOfReduces</name>
+ <value>370</value>
+ <description></description>
+</property>
+
+<property>
+ <name>monsterQuery.largeJobs.numOfMapoutputCompressed</name>
+ <value>3</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>monsterQuery.largeJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>webdataScan.smallJobs.numOfJobs</name>
+ <value>24</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataScan.smallJobs.inputFiles</name>
+ <value>${VARCOMPSEQ}/{part-00000,part-00001,part-00002}</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataScan.smallJobs.numOfMapoutputCompressed</name>
+ <value>24</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>webdataScan.smallJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>webdataScan.mediumJobs.numOfJobs</name>
+ <value>12</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataScan.mediumJobs.inputFiles</name>
+ <value>${VARCOMPSEQ}/{part-000*0,part-000*1,part-000*2}</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataScan.mediumJobs.numOfMapoutputCompressed</name>
+ <value>12</value>
+ <description> </description>
+</property>
+<property>
+ <name>webdataScan.mediumJobs.numOfReduces</name>
+ <value>7</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataScan.mediumJobs.numOfOutputCompressed</name>
+ <value>0</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>webdataScan.largeJobs.numOfJobs</name>
+ <value>2</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataScan.largeJobs.inputFiles</name>
+ <value>${VARCOMPSEQ}</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataScan.largeJobs.numOfMapoutputCompressed</name>
+ <value>3</value>
+ <description> </description>
+</property>
+<property>
+ <name>webdataScan.largeJobs.numOfReduces</name>
+ <value>70</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataScan.largeJobs.numOfOutputCompressed</name>
+ <value>3</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>webdataSort.smallJobs.numOfJobs</name>
+ <value>7</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataSort.smallJobs.inputFiles</name>
+ <value>${VARCOMPSEQ}/{part-00000,part-00001,part-00002}</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataSort.smallJobs.numOfReduces</name>
+ <value>15</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataSort.smallJobs.numOfMapoutputCompressed</name>
+ <value>7</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>webdataSort.smallJobs.numOfOutputCompressed</name>
+ <value>7</value>
+ <description> </description>
+</property>
+
+
+<property>
+ <name>webdataSort.mediumJobs.numOfJobs</name>
+ <value>4</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataSort.mediumJobs.inputFiles</name>
+ <value>${VARCOMPSEQ}/{part-000*0,part-000*1,part-000*2}</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataSort.mediumJobs.numOfReduces</name>
+ <value>170</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataSort.mediumJobs.numOfMapoutputCompressed</name>
+ <value>4</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>webdataSort.mediumJobs.numOfOutputCompressed</name>
+ <value>4</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>webdataSort.largeJobs.numOfJobs</name>
+ <value>1</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataSort.largeJobs.inputFiles</name>
+ <value>${VARCOMPSEQ}</value>
+ <description></description>
+</property>
+<property>
+ <name>webdataSort.largeJobs.numOfReduces</name>
+ <value>800</value>
+ <description></description>
+</property>
+
+<property>
+ <name>webdataSort.largeJobs.numOfMapoutputCompressed</name>
+ <value>1</value>
+ <description> </description>
+</property>
+
+<property>
+ <name>webdataSort.largeJobs.numOfOutputCompressed</name>
+ <value>1</value>
+ <description> </description>
+</property>
+
+</configuration>
Added: hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2 (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2 Mon Dec 1 23:03:09
2008
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+
+##############################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#####################################################################
+
+## Environment configuration
+
+GRID_DIR=`dirname "$0"`
+GRID_DIR=`cd "$GRID_DIR"; pwd`
+source $GRID_DIR/gridmix-env-2
+
+Date=`date +%F-%H-%M-%S-%N`
+echo $Date > $1_start.out
+
+export HADOOP_CLASSPATH=${APP_JAR}:${EXAMPLE_JAR}:${STREAMING_JAR}
+export LIBJARS=${APP_JAR},${EXAMPLE_JAR},${STREAMING_JAR}
+${HADOOP_HOME}/bin/hadoop jar -libjars ${LIBJARS} ./gridmix.jar
org.apache.hadoop.mapred.GridMixRunner
+
+Date=`date +%F-%H-%M-%S-%N`
+echo $Date > $1_end.out
+
Added:
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java?rev=722391&view=auto
==============================================================================
---
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
(added)
+++
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
Mon Dec 1 23:03:09 2008
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.mapred;
+
+import org.apache.hadoop.examples.WordCount;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.Text;
+
+public class CombinerJobCreator extends WordCount {
+
+ public JobConf createJob(String[] args) throws Exception {
+ JobConf conf = new JobConf(WordCount.class);
+ conf.setJobName("GridmixCombinerJob");
+
+ // the keys are words (strings)
+ conf.setOutputKeyClass(Text.class);
+ // the values are counts (ints)
+ conf.setOutputValueClass(IntWritable.class);
+
+ conf.setMapperClass(MapClass.class);
+ conf.setCombinerClass(Reduce.class);
+ conf.setReducerClass(Reduce.class);
+ boolean mapoutputCompressed = false;
+ boolean outputCompressed = false;
+ // List<String> other_args = new ArrayList<String>();
+ for (int i = 0; i < args.length; ++i) {
+ try {
+ if ("-r".equals(args[i])) {
+ conf.setNumReduceTasks(Integer.parseInt(args[++i]));
+ } else if ("-indir".equals(args[i])) {
+ FileInputFormat.setInputPaths(conf, args[++i]);
+ } else if ("-outdir".equals(args[i])) {
+ FileOutputFormat.setOutputPath(conf, new Path(args[++i]));
+
+ } else if ("-mapoutputCompressed".equals(args[i])) {
+ mapoutputCompressed = Boolean.valueOf(args[++i]).booleanValue();
+ } else if ("-outputCompressed".equals(args[i])) {
+ outputCompressed = Boolean.valueOf(args[++i]).booleanValue();
+ }
+ } catch (NumberFormatException except) {
+ System.out.println("ERROR: Integer expected instead of " + args[i]);
+ return null;
+ } catch (ArrayIndexOutOfBoundsException except) {
+ System.out.println("ERROR: Required parameter missing from "
+ + args[i - 1]);
+ return null;
+ }
+ }
+ conf.setCompressMapOutput(mapoutputCompressed);
+ conf.setBoolean("mapred.output.compress", outputCompressed);
+ return conf;
+ }
+}
Added:
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java?rev=722391&view=auto
==============================================================================
---
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
(added)
+++
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
Mon Dec 1 23:03:09 2008
@@ -0,0 +1,98 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.mapred;
+
+import java.util.Random;
+import java.util.Stack;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.SequenceFile;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.GenericMRLoadGenerator;
+import org.apache.hadoop.mapred.lib.NullOutputFormat;
+import org.apache.hadoop.mapred.JobConf;
+
+public class GenericMRLoadJobCreator extends GenericMRLoadGenerator {
+
+ public JobConf createJob(String[] argv, boolean mapoutputCompressed,
+ boolean outputCompressed) throws Exception {
+
+ JobConf job = new JobConf();
+ job.setJarByClass(GenericMRLoadGenerator.class);
+ job.setMapperClass(SampleMapper.class);
+ job.setReducerClass(SampleReducer.class);
+ if (!parseArgs(argv, job)) {
+ return null;
+ }
+
+ if (null == FileOutputFormat.getOutputPath(job)) {
+ // No output dir? No writes
+ job.setOutputFormat(NullOutputFormat.class);
+ }
+
+ if (0 == FileInputFormat.getInputPaths(job).length) {
+ // No input dir? Generate random data
+ System.err.println("No input path; ignoring InputFormat");
+ confRandom(job);
+ } else if (null != job.getClass("mapred.indirect.input.format", null)) {
+ // specified IndirectInputFormat? Build src list
+ JobClient jClient = new JobClient(job);
+ Path sysdir = jClient.getSystemDir();
+ Random r = new Random();
+ Path indirInputFile = new Path(sysdir, Integer.toString(r
+ .nextInt(Integer.MAX_VALUE), 36)
+ + "_files");
+ job.set("mapred.indirect.input.file", indirInputFile.toString());
+ SequenceFile.Writer writer = SequenceFile.createWriter(sysdir
+ .getFileSystem(job), job, indirInputFile, LongWritable.class,
+ Text.class, SequenceFile.CompressionType.NONE);
+ try {
+ for (Path p : FileInputFormat.getInputPaths(job)) {
+ FileSystem fs = p.getFileSystem(job);
+ Stack<Path> pathstack = new Stack<Path>();
+ pathstack.push(p);
+ while (!pathstack.empty()) {
+ for (FileStatus stat : fs.listStatus(pathstack.pop())) {
+ if (stat.isDir()) {
+ if (!stat.getPath().getName().startsWith("_")) {
+ pathstack.push(stat.getPath());
+ }
+ } else {
+ writer.sync();
+ writer.append(new LongWritable(stat.getLen()), new Text(stat
+ .getPath().toUri().toString()));
+ }
+ }
+ }
+ }
+ } finally {
+ writer.close();
+ }
+ }
+
+ job.setCompressMapOutput(mapoutputCompressed);
+ job.setBoolean("mapred.output.compress", outputCompressed);
+ return job;
+
+ }
+
+}
Added:
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java?rev=722391&view=auto
==============================================================================
---
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
(added)
+++
hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
Mon Dec 1 23:03:09 2008
@@ -0,0 +1,34 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.mapred;
+
+import org.apache.hadoop.conf.Configuration;
+
+public class GridMixConfig extends Configuration {
+
+ public int[] getInts(String name, int defautValue) {
+ String[] valuesInString = getStrings(name, String.valueOf(defautValue));
+ int[] results = new int[valuesInString.length];
+ for (int i = 0; i < valuesInString.length; i++) {
+ results[i] = Integer.parseInt(valuesInString[i]);
+ }
+ return results;
+
+ }
+}