Author: cdouglas Date: Tue Jan 8 17:00:02 2008 New Revision: 610248 URL: http://svn.apache.org/viewvc?rev=610248&view=rev Log: HADOOP-2369. Adds a set of scripts for simulating a mix of user map/reduce workloads. (Runping Qi via cdouglas)
Added: lucene/hadoop/trunk/src/test/gridmix/ lucene/hadoop/trunk/src/test/gridmix/README lucene/hadoop/trunk/src/test/gridmix/generateData.sh lucene/hadoop/trunk/src/test/gridmix/gridmix-env lucene/hadoop/trunk/src/test/gridmix/javasort/ lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.large lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.medium lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.small lucene/hadoop/trunk/src/test/gridmix/maxent/ lucene/hadoop/trunk/src/test/gridmix/maxent/maxent.large lucene/hadoop/trunk/src/test/gridmix/monsterQuery/ lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.large lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.medium lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.small lucene/hadoop/trunk/src/test/gridmix/pipesort/ lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.large lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.medium lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.small lucene/hadoop/trunk/src/test/gridmix/streamsort/ lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.large lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.medium lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.small lucene/hadoop/trunk/src/test/gridmix/submissionScripts/ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allThroughHod lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allToSameCluster lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentHod lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentToSameCluster lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesHod lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesToSameCluster lucene/hadoop/trunk/src/test/gridmix/submissionScripts/sleep_if_too_busy lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortHod lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortToSameCluster lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanHod lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanToSameCluster lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortHod lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortToSameCluster lucene/hadoop/trunk/src/test/gridmix/webdatascan/ lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.large lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.medium lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.small lucene/hadoop/trunk/src/test/gridmix/webdatasort/ lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.large lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.medium lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.small Modified: lucene/hadoop/trunk/CHANGES.txt Modified: lucene/hadoop/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/CHANGES.txt?rev=610248&r1=610247&r2=610248&view=diff ============================================================================== --- lucene/hadoop/trunk/CHANGES.txt (original) +++ lucene/hadoop/trunk/CHANGES.txt Tue Jan 8 17:00:02 2008 @@ -172,6 +172,9 @@ HADOOP-2233. Adds a generic load generator for modeling MR jobs. (cdouglas) + HADOOP-2369. Adds a set of scripts for simulating a mix of user map/reduce + workloads. (Runping Qi via cdouglas) + OPTIMIZATIONS HADOOP-1898. Release the lock protecting the last time of the last stack Added: lucene/hadoop/trunk/src/test/gridmix/README URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/README?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/README (added) +++ lucene/hadoop/trunk/src/test/gridmix/README Tue Jan 8 17:00:02 2008 @@ -0,0 +1,168 @@ +### "Gridmix" Benchmark ### + +Contents: + +0 Overview +1 Getting Started + 1.0 Build + 1.1 Configure + 1.2 Generate test data +2 Running + 2.0 General + 2.1 Non-Hod cluster + 2.2 Hod + 2.2.0 Static cluster + 2.2.1 Hod cluster + + +* 0 Overview + +The scripts in this package model a cluster workload. The workload is +simulated by generating random data and submitting map/reduce jobs that +mimic observed data-access patterns in user jobs. The full benchmark +generates approximately 2.5TB of (often compressed) input data operated on +by the following simulated jobs: + +1) Three stage map/reduce job + Input: 500GB compressed (2TB uncompressed) SequenceFile + (k,v) = (5 words, 100 words) + hadoop-env: FIXCOMPSEQ + Compute1: keep 10% map, 40% reduce + Compute2: keep 100% map, 77% reduce + Input from Compute1 + Compute3: keep 116% map, 91% reduce + Input from Compute2 + Motivation: Many user workloads are implemented as pipelined map/reduce + jobs, including Pig workloads + +2) Large sort of variable key/value size + Input: 500GB compressed (2TB uncompressed) SequenceFile + (k,v) = (5-10 words, 100-10000 words) + hadoop-env: VARCOMPSEQ + Compute: keep 100% map, 100% reduce + Motivation: Processing large, compressed datsets is common. + +3) Reference select + Input: 500GB compressed (2TB uncompressed) SequenceFile + (k,v) = (5-10 words, 100-10000 words) + hadoop-env: VARCOMPSEQ + Compute: keep 0.2% map, 5% reduce + 1 Reducer + Motivation: Sampling from a large, reference dataset is common. + +4) Indirect Read + Input: 500GB compressed (2TB uncompressed) Text + (k,v) = (5 words, 20 words) + hadoop-env: FIXCOMPTEXT + Compute: keep 50% map, 100% reduce Each map reads 1 input file, + adding additional input files from the output of the + previous iteration for 10 iterations + Motivation: User jobs in the wild will often take input data without + consulting the framework. This simulates an iterative job + whose input data is all "indirect," i.e. given to the + framework sans locality metadata. + +5) API text sort (java, pipes, streaming) + Input: 500GB uncompressed Text + (k,v) = (1-10 words, 0-200 words) + hadoop-env: VARINFLTEXT + Compute: keep 100% map, 100% reduce + Motivation: This benchmark should exercise each of the APIs to + map/reduce + +Each of these jobs may be run individually or- using the scripts provided- +as a simulation of user activity sized to run in approximately 4 hours on a +480-500 node cluster using Hadoop 0.15.0. The benchmark runs a mix of small, +medium, and large jobs simultaneously, submitting each at fixed intervals. + +Notes(1-4): Since input data are compressed, this means that each mapper +outputs a lot more bytes than it reads in, typically causing map output +spills. + + + +* 1 Getting Started + +1.0 Build + +1) Compile the examples, including the C++ sources: + > ant -Dcompile.c++=yes examples +2) Copy the pipe sort example to a location in the default filesystem + (usually HDFS, default /gridmix/programs) + > $HADOOP_HOME/hadoop dfs -mkdir $GRID_MIX_PROG + > $HADOOP_HOME/hadoop dfs -put build/c++-examples/$PLATFORM_STR/bin/pipes-sort $GRID_MIX_PROG + +1.1 Configure + +One must modify hadoop-env to supply the following information: + +HADOOP_HOME The hadoop install location +GRID_MIX_HOME The location of these scripts +APP_JAR The location of the hadoop example +GRID_MIX_DATA The location of the datsets for these benchmarks +GRID_MIX_PROG The location of the pipe-sort example + +Reasonable defaults are provided for all but HADOOP_HOME. The datasets used +by each of the respective benchmarks are recorded in the Input::hadoop-env +comment in section 0 and their location may be changed in hadoop-env. Note +that each job expects particular input data and the parameters given to it +must be changed in each script if a different InputFormat, keytype, or +valuetype is desired. + +Note that NUM_OF_REDUCERS_FOR_*_JOB properties should be sized to the +cluster on which the benchmarks will be run. The default assumes a large +(450-500 node) cluster. + +1.2 Generate test data + +Test data is generated using the generateData.sh script. While one may +modify the structure and size of the data generated here, note that many of +the scripts- particularly for medium and small sized jobs- rely not only on +specific InputFormats and key/value types, but also on a particular +structure to the input data. Changing these values will likely be necessary +to run on small and medium-sized clusters, but any modifications must be +informed by an explicit familiarity with the underlying scripts. + +It is sufficient to run the script without modification, though it may +require up to 4TB of free space in the default filesystem. Changing the size +of the input data (COMPRESSED_DATA_BYTES, UNCOMPRESSED_DATA_BYTES, +INDIRECT_DATA_BYTES) is safe. A 4x compression ratio for generated, block +compressed data is typical. + +* 2 Running + +2.0 General + +The submissionScripts directory contains the high-level scripts submitting +sized jobs for the gridmix benchmark. Each submits $NUM_OF_*_JOBS_PER_CLASS +instances as specified in the gridmix-env script, where an instance is an +invocation of a script as in $JOBTYPE/$JOBTYPE.$CLASS (e.g. +javasort/text-sort.large). Each instance may submit one or more map/reduce +jobs. + +There is a backoff script, submissionScripts/sleep_if_too_busy that can be +modified to define throttling criteria. By default, it simply counts running +java processes. + +2.1 Non-Hod cluster + +The submissionScripts/allToSameCluster script will invoke each of the other +submission scripts for the gridmix benchmark. Depending on how your cluster +manages job submission, these scripts may require modification. The details +are very context-dependent. + +2.2 Hod + +Note that there are options in hadoop-env that control jobs sumitted thruogh +Hod. One may specify the location of a config (HOD_CONFIG), the number of +nodes to allocate for classes of jobs, and any additional options one wants +to apply. The default includes an example for supplying a Hadoop tarball for +testing platform changes (see Hod documentation). + +2.2.0 Static Cluster + +> hod --hod.script=submissionScripts/allToSameCluster -m 500 + +2.2.1 Hod-allocated cluster + +> ./submissionScripts/allThroughHod Added: lucene/hadoop/trunk/src/test/gridmix/generateData.sh URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/generateData.sh?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/generateData.sh (added) +++ lucene/hadoop/trunk/src/test/gridmix/generateData.sh Tue Jan 8 17:00:02 2008 @@ -0,0 +1,69 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/gridmix-env + +# 2TB data compressing to approx 500GB +#COMPRESSED_DATA_BYTES=2147483648000 +COMPRESSED_DATA_BYTES=2147483648 +# 500GB +#UNCOMPRESSED_DATA_BYTES=536870912000 +UNCOMPRESSED_DATA_BYTES=536870912 +# Number of partitions for output data +NUM_MAPS=100 +# Default approx 70MB per data file, compressed +#INDIRECT_DATA_BYTES=58720256000 +INDIRECT_DATA_BYTES=58720256 +INDIRECT_DATA_FILES=200 + +${HADOOP_HOME}/bin/hadoop jar \ + ${EXAMPLE_JAR} randomtextwriter \ + -D test.randomtextwrite.total_bytes=${COMPRESSED_DATA_BYTES} \ + -D test.randomtextwrite.bytes_per_map=$((${COMPRESSED_DATA_BYTES} / ${NUM_MAPS})) \ + -D test.randomtextwrite.min_words_key=5 \ + -D test.randomtextwrite.max_words_key=10 \ + -D test.randomtextwrite.min_words_value=100 \ + -D test.randomtextwrite.max_words_value=10000 \ + -D mapred.output.compress=true \ + -D mapred.map.output.compression.type=BLOCK \ + -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \ + ${VARCOMPSEQ} + +${HADOOP_HOME}/bin/hadoop jar \ + ${EXAMPLE_JAR} randomtextwriter \ + -D test.randomtextwrite.total_bytes=${COMPRESSED_DATA_BYTES} \ + -D test.randomtextwrite.bytes_per_map=$((${COMPRESSED_DATA_BYTES} / ${NUM_MAPS})) \ + -D test.randomtextwrite.min_words_key=5 \ + -D test.randomtextwrite.max_words_key=5 \ + -D test.randomtextwrite.min_words_value=100 \ + -D test.randomtextwrite.max_words_value=100 \ + -D mapred.output.compress=true \ + -D mapred.map.output.compression.type=BLOCK \ + -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \ + ${FIXCOMPSEQ} + +${HADOOP_HOME}/bin/hadoop jar \ + ${EXAMPLE_JAR} randomtextwriter \ + -D test.randomtextwrite.total_bytes=${UNCOMPRESSED_DATA_BYTES} \ + -D test.randomtextwrite.bytes_per_map=$((${UNCOMPRESSED_DATA_BYTES} / ${NUM_MAPS})) \ + -D test.randomtextwrite.min_words_key=1 \ + -D test.randomtextwrite.max_words_key=10 \ + -D test.randomtextwrite.min_words_value=0 \ + -D test.randomtextwrite.max_words_value=200 \ + -D mapred.output.compress=false \ + -outFormat org.apache.hadoop.mapred.TextOutputFormat \ + ${VARINFLTEXT} + +${HADOOP_HOME}/bin/hadoop jar \ + ${EXAMPLE_JAR} randomtextwriter \ + -D test.randomtextwrite.total_bytes=${INDIRECT_DATA_BYTES} \ + -D test.randomtextwrite.bytes_per_map=$((${INDIRECT_DATA_BYTES} / ${INDIRECT_DATA_FILES})) \ + -D test.randomtextwrite.min_words_key=5 \ + -D test.randomtextwrite.max_words_key=5 \ + -D test.randomtextwrite.min_words_value=20 \ + -D test.randomtextwrite.max_words_value=20 \ + -D mapred.output.compress=true \ + -D mapred.map.output.compression.type=BLOCK \ + -outFormat org.apache.hadoop.mapred.TextOutputFormat \ + ${FIXCOMPTEXT} Added: lucene/hadoop/trunk/src/test/gridmix/gridmix-env URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/gridmix-env?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/gridmix-env (added) +++ lucene/hadoop/trunk/src/test/gridmix/gridmix-env Tue Jan 8 17:00:02 2008 @@ -0,0 +1,50 @@ +#!/bin/bash + + +## Environment configuration +# Hadoop installation +export HADOOP_HOME= +# Base directory for gridmix install +export GRID_MIX_HOME=${GRID_DIR} +# Hadoop example jar +export EXAMPLE_JAR=${HADOOP_HOME}/hadoop-0.15.2-dev-examples.jar +# Hadoop test jar +export APP_JAR=${HADOOP_HOME}/hadoop-0.15.2-dev-test.jar +# Hadoop streaming jar +export STREAM_JAR=${HADOOP_HOME}/contrib/hadoop-0.15.2-streaming.jar +# Location on default filesystem for writing gridmix data (usually HDFS) +# Default: /gridmix/data +export GRID_MIX_DATA=/gridmix/data +# Location of executables in default filesystem (usually HDFS) +# Default: /gridmix/programs +export GRID_MIX_PROG=/gridmix/programs + +## Data sources +# Variable length key, value compressed SequenceFile +export VARCOMPSEQ=${GRID_MIX_DATA}/WebSimulationBlockCompressed +# Fixed length key, value compressed SequenceFile +export FIXCOMPSEQ=${GRID_MIX_DATA}/MonsterQueryBlockCompressed +# Variable length key, value uncompressed Text File +export VARINFLTEXT=${GRID_MIX_DATA}/SortUncompressed +# Fixed length key, value compressed Text File +export FIXCOMPTEXT=${GRID_MIX_DATA}/EntropySimulationCompressed + +## Job sizing +export NUM_OF_LARGE_JOBS_PER_CLASS=3 +export NUM_OF_MEDIUM_JOBS_PER_CLASS=20 +export NUM_OF_SMALL_JOBS_PER_CLASS=40 + +export NUM_OF_REDUCERS_FOR_LARGE_JOB=370 +export NUM_OF_REDUCERS_FOR_MEDIUM_JOB=170 +export NUM_OF_REDUCERS_FOR_SMALL_JOB=15 + +## Throttling +export INTERVAL_BETWEEN_SUBMITION=20 + +## Hod +#export HOD_OPTIONS="--ringmaster.hadoop-tar-ball=/path/to/hadoop-0.15.0-dev.tar.gz" +#export HOD_CONFIG= +#export ALL_HOD_OPTIONS="$HOD_OPTIONS -c ${HOD_CONFIG}" +#export SMALL_JOB_HOD_OPTIONS="$ALL_HOD_OPTIONS -m 5" +#export MEDIUM_JOB_HOD_OPTIONS="$ALL_HOD_OPTIONS -m 50" +#export LARGE_JOB_HOD_OPTIONS="$ALL_HOD_OPTIONS -m 100" Added: lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.large URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.large?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.large (added) +++ lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.large Tue Jan 8 17:00:02 2008 @@ -0,0 +1,14 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +INDIR=${VARINFLTEXT} + +Date=`date +%F-%H-%M-%S` +OUTDIR=perf-out/sort-out-dir-large_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar ${APP_JAR} sort -m 1 -r $NUM_OF_REDUCERS_FOR_LARGE_JOB -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat -outFormat org.apache.hadoop.mapred.TextOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text $INDIR $OUTDIR + Added: lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.medium URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.medium?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.medium (added) +++ lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.medium Tue Jan 8 17:00:02 2008 @@ -0,0 +1,14 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +INDIR=${VARINFLTEXT}/part-000*0,${VARINFLTEXT}/part-000*1,${VARINFLTEXT}/part-000*2 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/sort-out-dir-medium_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar ${APP_JAR} sort -m 1 -r $NUM_OF_REDUCERS_FOR_MEDIUM_JOB -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat -outFormat org.apache.hadoop.mapred.TextOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text $INDIR $OUTDIR + Added: lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.small URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.small?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.small (added) +++ lucene/hadoop/trunk/src/test/gridmix/javasort/text-sort.small Tue Jan 8 17:00:02 2008 @@ -0,0 +1,14 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +INDIR=${VARINFLTEXT}/part-00000,${VARINFLTEXT}/part-00001,${VARINFLTEXT}/part-00002 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/sort-out-dir-small_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar ${APP_JAR} sort -m 1 -r $NUM_OF_REDUCERS_FOR_SMALL_JOB -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat -outFormat org.apache.hadoop.mapred.TextOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text $INDIR $OUTDIR + Added: lucene/hadoop/trunk/src/test/gridmix/maxent/maxent.large URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/maxent/maxent.large?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/maxent/maxent.large (added) +++ lucene/hadoop/trunk/src/test/gridmix/maxent/maxent.large Tue Jan 8 17:00:02 2008 @@ -0,0 +1,26 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=100 +INDIR=${FIXCOMPTEXT} +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/maxent-out-dir-large_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 50 -keepred 100 -inFormatIndirect org.apache.hadoop.mapred.TextInputFormat -outFormat org.apache.hadoop.mapred.TextOutputFormat -outKey org.apache.hadoop.io.LongWritable -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR.1 -r $NUM_OF_REDUCERS + +ITER=11 +for ((i=1; i<$ITER; ++i)) +do + ${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 50 -keepred 100 -inFormatIndirect org.apache.hadoop.mapred.TextInputFormat -outFormat org.apache.hadoop.mapred.TextOutputFormat -outKey org.apache.hadoop.io.LongWritable -outValue org.apache.hadoop.io.Text -indir $INDIR -indir $OUTDIR.$i -outdir $OUTDIR.$(($i+1)) -r $NUM_OF_REDUCERS + if [ $? -ne "0" ] + then exit $? + fi + ${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR.$i +done + +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR.$ITER Added: lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.large URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.large?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.large (added) +++ lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.large Tue Jan 8 17:00:02 2008 @@ -0,0 +1,27 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_LARGE_JOB +INDIR=${FIXCOMPSEQ} +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/mq-out-dir-large_$Date.1 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 10 -keepred 40 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + +INDIR=$OUTDIR +OUTDIR=perf-out/mq-out-dir-large_$Date.2 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 100 -keepred 77 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + +INDIR=$OUTDIR +OUTDIR=perf-out/mq-out-dir-large_$Date.3 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 116 -keepred 91 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + Added: lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.medium URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.medium?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.medium (added) +++ lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.medium Tue Jan 8 17:00:02 2008 @@ -0,0 +1,27 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_MEDIUM_JOB +INDIR=${FIXCOMPSEQ}/part-000*0,${FIXCOMPSEQ}/part-000*1,${FIXCOMPSEQ}/part-000*2 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/mq-out-dir-medium_$Date.1 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 10 -keepred 40 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + +INDIR=$OUTDIR +OUTDIR=perf-out/mq-out-dir-medium_$Date.2 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 100 -keepred 77 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + +INDIR=$OUTDIR +OUTDIR=perf-out/mq-out-dir-medium_$Date.3 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 116 -keepred 91 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + Added: lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.small URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.small?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.small (added) +++ lucene/hadoop/trunk/src/test/gridmix/monsterQuery/monster_query.small Tue Jan 8 17:00:02 2008 @@ -0,0 +1,27 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_SMALL_JOB +INDIR=${FIXCOMPSEQ}/part-00000,${FIXCOMPSEQ}/part-00001,${FIXCOMPSEQ}/part-00002 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/mq-out-dir-small_$Date.1 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 10 -keepred 40 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + +INDIR=$OUTDIR +OUTDIR=perf-out/mq-out-dir-small_$Date.2 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 100 -keepred 77 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + +INDIR=$OUTDIR +OUTDIR=perf-out/mq-out-dir-small_$Date.3 +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 116 -keepred 91 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + Added: lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.large URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.large?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.large (added) +++ lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.large Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_LARGE_JOB +INDIR=${VARINFLTEXT} +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/pipe-out-dir-large_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + + +${HADOOP_HOME}/bin/hadoop pipes -input $INDIR -output $OUTDIR -inputformat org.apache.hadoop.mapred.KeyValueTextInputFormat -program ${GRID_MIX_PROG}/pipes-sort -reduces $NUM_OF_REDUCERS -jobconf mapred.output.key.class=org.apache.hadoop.io.Text,mapred.output.value.class=org.apache.hadoop.io.Text -writer org.apache.hadoop.mapred.TextOutputFormat + Added: lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.medium URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.medium?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.medium (added) +++ lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.medium Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_MEDIUM_JOB +INDIR=${VARINFLTEXT}/part-000*0,${VARINFLTEXT}/part-000*1,${VARINFLTEXT}/part-000*2 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/pipe-out-dir-medium_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + + +${HADOOP_HOME}/bin/hadoop pipes -input $INDIR -output $OUTDIR -inputformat org.apache.hadoop.mapred.KeyValueTextInputFormat -program ${GRID_MIX_PROG}/pipes-sort -reduces $NUM_OF_REDUCERS -jobconf mapred.output.key.class=org.apache.hadoop.io.Text,mapred.output.value.class=org.apache.hadoop.io.Text -writer org.apache.hadoop.mapred.TextOutputFormat + Added: lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.small URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.small?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.small (added) +++ lucene/hadoop/trunk/src/test/gridmix/pipesort/text-sort.small Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_SMALL_JOB +INDIR=${VARINFLTEXT}/part-00000,${VARINFLTEXT}/part-00001,${VARINFLTEXT}/part-00002 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/pipe-out-dir-small_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + + +${HADOOP_HOME}/bin/hadoop pipes -input $INDIR -output $OUTDIR -inputformat org.apache.hadoop.mapred.KeyValueTextInputFormat -program ${GRID_MIX_PROG}/pipes-sort -reduces $NUM_OF_REDUCERS -jobconf mapred.output.key.class=org.apache.hadoop.io.Text,mapred.output.value.class=org.apache.hadoop.io.Text -writer org.apache.hadoop.mapred.TextOutputFormat + Added: lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.large URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.large?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.large (added) +++ lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.large Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +export NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_LARGE_JOB +export INDIR=${VARINFLTEXT} +Date=`date +%F-%H-%M-%S` + +export OUTDIR=perf-out/stream-out-dir-large_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + + +${HADOOP_HOME}/bin/hadoop jar ${STREAM_JAR} -input $INDIR -output $OUTDIR -mapper cat -reducer cat -numReduceTasks $NUM_OF_REDUCERS + Added: lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.medium URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.medium?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.medium (added) +++ lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.medium Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_MEDIUM_JOB +INDIR=${VARINFLTEXT}/part-000*0,${VARINFLTEXT}/part-000*1,${VARINFLTEXT}/part-000*2 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/stream-out-dir-medium_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + + +${HADOOP_HOME}/bin/hadoop jar ${STREAM_JAR} -input $INDIR -output $OUTDIR -mapper cat -reducer cat -numReduceTasks $NUM_OF_REDUCERS + Added: lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.small URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.small?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.small (added) +++ lucene/hadoop/trunk/src/test/gridmix/streamsort/text-sort.small Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_SMALL_JOB +INDIR=${VARINFLTEXT}/part-00000,${VARINFLTEXT}/part-00001,${VARINFLTEXT}/part-00002 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/stream-out-dir-small_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + + +${HADOOP_HOME}/bin/hadoop jar ${STREAM_JAR} -input $INDIR -output $OUTDIR -mapper cat -reducer cat -numReduceTasks $NUM_OF_REDUCERS + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allThroughHod URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allThroughHod?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allThroughHod (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allThroughHod Tue Jan 8 17:00:02 2008 @@ -0,0 +1,13 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +$GRID_MIX_HOME/submissionScripts/textSortHod 2>&1 > textSortHod.out & +$GRID_MIX_HOME/submissionScripts/monsterQueriesHod 2>&1 > monsterQueriesHod.out & +$GRID_MIX_HOME/submissionScripts/webdataScanHod 2>&1 > webdataScanHod.out & +$GRID_MIX_HOME/submissionScripts/webdataSortHod 2>&1 > webdataSortHod.out & +$GRID_MIX_HOME/submissionScripts/maxentHod 2>&1 > maxentHod.out & + + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allToSameCluster URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allToSameCluster?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allToSameCluster (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/allToSameCluster Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +$GRID_MIX_HOME/submissionScripts/textSortToSameCluster 2>&1 > textSortToSameCluster.out & +sleep 20 +$GRID_MIX_HOME/submissionScripts/monsterQueriesToSameCluster 2>&1 > monsterQueriesToSameCluster.out & +sleep 20 +$GRID_MIX_HOME/submissionScripts/webdataScanToSameCluster 2>&1 > webdataScanToSameCluster.out & +sleep 20 +$GRID_MIX_HOME/submissionScripts/webdataSortToSameCluster 2>&1 > webdataSortToSameCluster.out & +sleep 20 +$GRID_MIX_HOME/submissionScripts/maxentToSameCluster 2>&1 > maxentToSameCluster.out & + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentHod URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentHod?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentHod (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentHod Tue Jan 8 17:00:02 2008 @@ -0,0 +1,12 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + hod $LARGE_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/maxent/maxent.large 2>&1 > maxent.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentToSameCluster URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentToSameCluster?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentToSameCluster (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/maxentToSameCluster Tue Jan 8 17:00:02 2008 @@ -0,0 +1,12 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/maxent/maxent.large 2>&1 > maxent.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesHod URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesHod?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesHod (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesHod Tue Jan 8 17:00:02 2008 @@ -0,0 +1,26 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_SMALL_JOBS_PER_CLASS; i++)) +do + echo $i + hod $SMALL_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/monsterQuery/monster_query.small 2>&1 > monster_query.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_MEDIUM_JOBS_PER_CLASS; i++)) +do + echo $i + hod $MEDIUM_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/monsterQuery/monster_query.medium 2>&1 > monster_query.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + hod $LARGE_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/monsterQuery/monster_query.large 2>&1 > monster_query.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesToSameCluster URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesToSameCluster?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesToSameCluster (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/monsterQueriesToSameCluster Tue Jan 8 17:00:02 2008 @@ -0,0 +1,27 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_SMALL_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/monsterQuery/monster_query.small 2>&1 > monster_query.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_MEDIUM_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/monsterQuery/monster_query.medium 2>&1 > monster_query.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/monsterQuery/monster_query.large 2>&1 > monster_query.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/sleep_if_too_busy URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/sleep_if_too_busy?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/sleep_if_too_busy (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/sleep_if_too_busy Tue Jan 8 17:00:02 2008 @@ -0,0 +1,11 @@ +#!/bin/bash + +sleep 1 +for ((java_process=$((`ps -ef|grep java|wc|awk '{print $1}'`-1)); \ + java_process > 60; \ + java_process=$((`ps -ef|grep java|wc|awk '{print $1}'`-1)))) +do + sleep 10 + echo $java_process +done + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortHod URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortHod?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortHod (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortHod Tue Jan 8 17:00:02 2008 @@ -0,0 +1,39 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_SMALL_JOBS_PER_CLASS; i++)) +do + echo $i + hod $SMALL_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/pipesort/text-sort.small 2>&1 > pipesort.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + hod $SMALL_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/streamsort/text-sort.small 2>&1 > streamsort.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + hod $SMALL_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/javasort/text-sort.small 2>&1 > javasort.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_MEDIUM_JOBS_PER_CLASS; i++)) +do + echo $i + hod $MEDIUM_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/pipesort/text-sort.medium 2>&1 > pipesort.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + hod $MEDIUM_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/streamsort/text-sort.medium 2>&1 > streamsort.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + hod $MEDIUM_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/javasort/text-sort.medium 2>&1 > javasort.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + hod $LARGE_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/pipesort/text-sort.large 2>&1 > pipesort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + hod $LARGE_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/streamsort/text-sort.large 2>&1 > streamsort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + hod $LARGE_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/javasort/text-sort.large 2>&1 > javasort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortToSameCluster URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortToSameCluster?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortToSameCluster (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/textSortToSameCluster Tue Jan 8 17:00:02 2008 @@ -0,0 +1,39 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_SMALL_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/pipesort/text-sort.small 2>&1 > pipesort.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + $GRID_MIX_HOME/streamsort/text-sort.small 2>&1 > streamsort.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + $GRID_MIX_HOME/javasort/text-sort.small 2>&1 > javasort.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_MEDIUM_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/pipesort/text-sort.medium 2>&1 > pipesort.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + $GRID_MIX_HOME/streamsort/text-sort.medium 2>&1 > streamsort.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + $GRID_MIX_HOME/javasort/text-sort.medium 2>&1 > javasort.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/pipesort/text-sort.large 2>&1 > pipesort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + $GRID_MIX_HOME/streamsort/text-sort.large 2>&1 > pipesort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy + $GRID_MIX_HOME/javasort/text-sort.large 2>&1 > pipesort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanHod URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanHod?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanHod (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanHod Tue Jan 8 17:00:02 2008 @@ -0,0 +1,28 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_SMALL_JOBS_PER_CLASS; i++)) +do + echo $i + hod $SMALL_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/webdatascan/webdata_scan.small 2>&1 > webdata_scan.small.$i.out& + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + + +for ((i=0; i < $NUM_OF_MEDIUM_JOBS_PER_CLASS; i++)) +do + echo $i + hod $MEDIUM_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/webdatascan/webdata_scan.medium 2>&1 > webdata_scan.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + hod $LARGE_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/webdatascan/webdata_scan.large 2>&1 > webdata_scan.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanToSameCluster URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanToSameCluster?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanToSameCluster (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataScanToSameCluster Tue Jan 8 17:00:02 2008 @@ -0,0 +1,28 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_MEDIUM_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/webdatascan/webdata_scan.medium 2>&1 > webdata_scan.medium.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_SMALL_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/webdatascan/webdata_scan.small 2>&1 > webdata_scan.small.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/webdatascan/webdata_scan.large 2>&1 > webdata_scan.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortHod URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortHod?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortHod (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortHod Tue Jan 8 17:00:02 2008 @@ -0,0 +1,14 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + hod $LARGE_JOB_HOD_OPTIONS --hod.script=$GRID_MIX_HOME/webdatasort/webdata_sort.large 2>&1 > webdata_sort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + Added: lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortToSameCluster URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortToSameCluster?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortToSameCluster (added) +++ lucene/hadoop/trunk/src/test/gridmix/submissionScripts/webdataSortToSameCluster Tue Jan 8 17:00:02 2008 @@ -0,0 +1,13 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +for ((i=0; i < $NUM_OF_LARGE_JOBS_PER_CLASS; i++)) +do + echo $i + $GRID_MIX_HOME/webdatasort/webdata_sort.large 2>&1 > webdata_sort.large.$i.out & + $GRID_MIX_HOME/submissionScripts/sleep_if_too_busy +done + Added: lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.large URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.large?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.large (added) +++ lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.large Tue Jan 8 17:00:02 2008 @@ -0,0 +1,14 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=1 +INDIR=${VARCOMPSEQ} +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/webdata-scan-out-dir-large_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 0.2 -keepred 5 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS Added: lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.medium URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.medium?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.medium (added) +++ lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.medium Tue Jan 8 17:00:02 2008 @@ -0,0 +1,14 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=1 +INDIR=${VARCOMPSEQ}/part-000*0,${VARCOMPSEQ}/part-000*1,${VARCOMPSEQ}/part-000*2 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/webdata-scan-out-dir-medium_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar ${APP_JAR} loadgen -keepmap 1 -keepred 5 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS Added: lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.small URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.small?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.small (added) +++ lucene/hadoop/trunk/src/test/gridmix/webdatascan/webdata_scan.small Tue Jan 8 17:00:02 2008 @@ -0,0 +1,14 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=1 +INDIR=${VARCOMPSEQ}/part-00000,${VARCOMPSEQ}/part-00001,${VARCOMPSEQ}/part-00002 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/webdata-scan-out-dir-small_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 1 -keepred 5 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS Added: lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.large URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.large?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.large (added) +++ lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.large Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_LARGE_JOB +INDIR=${VARCOMPSEQ}/part-000*0,${VARCOMPSEQ}/part-000*1 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/webdata-sort-out-dir-large_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 100 -keepred 100 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + + Added: lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.medium URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.medium?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.medium (added) +++ lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.medium Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_MEDIUM_JOB +INDIR=${VARCOMPSEQ}/part-0000,${VARCOMPSEQ}/part-0001 +Date=`date +%F-%H-%M-%S` + +OUTDIR=perf-out/webdata-sort-out-dir-medium_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 100 -keepred 100 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + + Added: lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.small URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.small?rev=610248&view=auto ============================================================================== --- lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.small (added) +++ lucene/hadoop/trunk/src/test/gridmix/webdatasort/webdata_sort.small Tue Jan 8 17:00:02 2008 @@ -0,0 +1,16 @@ +#!/bin/bash + +GRID_DIR=`dirname "$0"` +GRID_DIR=`cd "$GRID_DIR"; pwd` +source $GRID_DIR/../gridmix-env + +NUM_OF_REDUCERS=$NUM_OF_REDUCERS_FOR_SMALL_JOB +INDIR=${VARCOMPSEQ}/part-00000 +Date=`date +%F-%H-%M-%S` + +export OUTDIR=perf-out/webdata-sort-out-dir-small_$Date +${HADOOP_HOME}/bin/hadoop dfs -rmr $OUTDIR + +${HADOOP_HOME}/bin/hadoop jar $APP_JAR loadgen -keepmap 100 -keepred 100 -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat -outKey org.apache.hadoop.io.Text -outValue org.apache.hadoop.io.Text -indir $INDIR -outdir $OUTDIR -r $NUM_OF_REDUCERS + +