a question on WordCount program failure

Taylor, Ronald C Mon, 15 Feb 2010 00:27:48 -0800

Hello,

I just joined the list and got a newbie question. Operating on a 10-node Linux 
cluster running Hadoop 0.20.1, I've been trying out the WordCount program.


I have three files: WordCount.java, WordCountMapper.java, and 
WordCountReducer.java. The contents of those three files are listed in full at 
bottom. 

Compilation, jarring and invocation appear to work fine, when done as follows:

javac WordCountMapper.java
javac WordCountReducer.java
javac WordCount.java

jar cf jarredWordCount.jar WordCountMapper.class WordCountReducer.class 
WordCount.class

Invocation:
hadoop jar jarredWordCount.jar WordCount 
"/user/rtaylor/WordCountInputDirectory" "/user/rtaylor/OutputDirectory"

%%%

However, the results are not what I expect. Here is partial listing from one of 
the output files:

artillery       1
barged  1
call    1
coalition       1
coalition       1
demonstrated    1
get     1
has     1
has     1

I was expecting, for example, to get one line for "coalition",  like so:

coalition 2

Instead I get the two (non-summed) lines that you see above. 

I've tried several changes, with no effect. I still get the same (wrong) output 
with no word summation. This is trying me nuts, especially since I presume that 
I am making a simple mistake that somebody should be able to be spot easily. So 
- please help!

   - Ron Taylor
___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group Pacific Northwest National 
Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: [email protected]

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

contents of WordCount.java:

import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;

public class WordCount {

    public static void main(String[] args)  
        throws java.io.IOException, 
               java.lang.InterruptedException,
               java.lang.ClassNotFoundException {

    org.apache.hadoop.conf.Configuration conf = new 
org.apache.hadoop.conf.Configuration();

    String[] otherArgs = new org.apache.hadoop.util.GenericOptionsParser(conf, 
args).getRemainingArgs();
       if (otherArgs.length != 2) {
             System.err.println("Error in parameter inputs - Usage: WordCount 
<in> <out>");
             System.exit(2);
       }
    String inputDirectory   = otherArgs[0];
    String outputDirectory  = otherArgs[1];

    Job job = new Job(conf, "WordCount");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountReducer.class);
    job.setReducerClass(WordCountReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(inputDirectory));
    FileOutputFormat.setOutputPath(job, new Path(outputDirectory));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }

}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

contents of WordCountMapper.java:

import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;

public class WordCountMapper extends org.apache.hadoop.mapreduce.Mapper 
<LongWritable, Text, Text, IntWritable> {
  private final IntWritable one = new IntWritable(1);
  private Text word = new Text();

    public void map(LongWritable key, Text value, 
org.apache.hadoop.mapreduce.Mapper.Context context)
                        throws IOException, java.lang.InterruptedException {
    String line = value.toString();
    StringTokenizer itr = new StringTokenizer(line.toLowerCase());
    while(itr.hasMoreTokens()) {
      word.set(itr.nextToken());
      context.write(word, one);
    }
  }
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

contents of WordCountReducer.java:

import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, 
IntWritable> {
    private IntWritable result = new IntWritable(); 
 
  public void reduce(Text key, Iterable<IntWritable> values,
                     org.apache.hadoop.mapreduce.Mapper.Context context) 
                        throws IOException, java.lang.InterruptedException {
    int sum = 0;
    for (IntWritable val : values) {
      int value = val.get();
      sum += value;
    }
    result.set(sum);
    context.write(key, result);
  }
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

a question on WordCount program failure

Reply via email to