Hello, I just joined the list and got a newbie question. Operating on a 10-node Linux cluster running Hadoop 0.20.1, I've been trying out the WordCount program.
I have three files: WordCount.java, WordCountMapper.java, and WordCountReducer.java. The contents of those three files are listed in full at bottom. Compilation, jarring and invocation appear to work fine, when done as follows: javac WordCountMapper.java javac WordCountReducer.java javac WordCount.java jar cf jarredWordCount.jar WordCountMapper.class WordCountReducer.class WordCount.class Invocation: hadoop jar jarredWordCount.jar WordCount "/user/rtaylor/WordCountInputDirectory" "/user/rtaylor/OutputDirectory" %%% However, the results are not what I expect. Here is partial listing from one of the output files: artillery 1 barged 1 call 1 coalition 1 coalition 1 demonstrated 1 get 1 has 1 has 1 I was expecting, for example, to get one line for "coalition", like so: coalition 2 Instead I get the two (non-summed) lines that you see above. I've tried several changes, with no effect. I still get the same (wrong) output with no word summation. This is trying me nuts, especially since I presume that I am making a simple mistake that somebody should be able to be spot easily. So - please help! - Ron Taylor ___________________________________________ Ronald Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: [email protected] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% contents of WordCount.java: import java.io.*; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.util.*; import org.apache.hadoop.mapreduce.lib.input.*; import org.apache.hadoop.mapreduce.lib.output.*; public class WordCount { public static void main(String[] args) throws java.io.IOException, java.lang.InterruptedException, java.lang.ClassNotFoundException { org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration(); String[] otherArgs = new org.apache.hadoop.util.GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Error in parameter inputs - Usage: WordCount <in> <out>"); System.exit(2); } String inputDirectory = otherArgs[0]; String outputDirectory = otherArgs[1]; Job job = new Job(conf, "WordCount"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(inputDirectory)); FileOutputFormat.setOutputPath(job, new Path(outputDirectory)); System.exit(job.waitForCompletion(true) ? 0 : 1); } } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% contents of WordCountMapper.java: import java.io.*; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.util.*; import org.apache.hadoop.mapreduce.lib.input.*; import org.apache.hadoop.mapreduce.lib.output.*; public class WordCountMapper extends org.apache.hadoop.mapreduce.Mapper <LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, java.lang.InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); while(itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% contents of WordCountReducer.java: import java.io.*; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.util.*; import org.apache.hadoop.mapreduce.lib.input.*; import org.apache.hadoop.mapreduce.lib.output.*; public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, java.lang.InterruptedException { int sum = 0; for (IntWritable val : values) { int value = val.get(); sum += value; } result.set(sum); context.write(key, result); } } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
