OK, I solved my problem myself. The difference between the two examples are use the of Combiner. If I simply disable the use of combiner in the WordCount code, the null works perfectly fine.
Cheers, Shi On Fri, Nov 4, 2011 at 11:32 PM, Shi Jin <[email protected]> wrote: > Hi there, > > I am learning hadoop and looking at the two example Java > codes SecondarySort.java and WordCount.java, using the latest stable > version 0.20.203.0. > > One interesting feature I found in the SecondarySort.java code is the use > of the null for the value sent by the reducer. > The code is copied as below: > > public static class Reduce > > extends Reducer<IntPair, IntWritable, Text, IntWritable> { > > * private static final Text SEPARATOR =* > > * new Text("------------------------------------------------");* > > private final Text first = new Text(); > > > @Override > > public void reduce(IntPair key, Iterable<IntWritable> values, > > Context context > > ) throws IOException, InterruptedException { > > * context.write(SEPARATOR, null);* > > first.set(Integer.toString(key.getFirst())); > > for(IntWritable value: values) { > > context.write(first, value); > > } > > } > > } > > > What I am interested is > > private static final Text SEPARATOR =new > Text("------------------------------------------------"); > and > > context.write(SEPARATOR, null); > > I think this is a nice way to control the format the output file (like > adding comments, separators etc). > > So I added the same code to the WordCounter.java example code (I made a > copy of it and called it WordCounter2). > I have the almost identical code: > * > * > > public static class IntSumReducer > > extends Reducer<Text,IntWritable,Text,IntWritable> { > > *private IntWritable result = new IntWritable();* > > * private static final Text SEPARATOR =* > > new Text("------------------------------------------------"); > > > > @Override > > public void reduce(Text key, Iterable<IntWritable> values, > > Context context > > ) throws IOException, InterruptedException { > > * context.write(SEPARATOR, null);* > *...* > > * > * > I had no problem building the code but when I ran it, I got the following > error: > > ubuntu@ubuntu-gui:~/hadoop$ hadoop jar WordCount2.jar WordCount2 > /test.txt /wc2result7 > 11/11/05 04:54:09 INFO input.FileInputFormat: Total input paths to process > : 1 > 11/11/05 04:54:09 INFO mapred.JobClient: Running job: job_201111021955_0052 > 11/11/05 04:54:10 INFO mapred.JobClient: map 0% reduce 0% > 11/11/05 04:54:24 INFO mapred.JobClient: Task Id : > attempt_201111021955_0052_m_000000_0, Status : FAILED > java.lang.NullPointerException > at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:166) > at > org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1078) > at > org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1399) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at WordCount2$IntSumReducer.reduce(WordCount2.java:46) > at WordCount2$IntSumReducer.reduce(WordCount2.java:35) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at > org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > > > This error message points me right back to context.write(SEPARATOR, null); > > So now I am very confused. Why does the same code works for one while not > the other? > Could anyone please help me here? > Thanks. > > Shi > > * > * > > -- Shi Jin, Ph.D.
