I am using hadoop 0.15.1 to index some catalog that has a tree-like structure, where the leaf nodes are data files. My main task is a loop that performs a breadth-first walkthrough that parses out URLs to catalogs and datafiles at the next level, which is done in a mapper. To determine when the loop should terminate, I use a reduce task that counts the number of new catalogs found, and stops the loop when the count is 0.
But while I was running the jobs, I kept getting this exception (pasted below from the logs). I didn't quite understand what it was trying to say. But in my code, I never used LongWritable. Only Text for output key and output values, and KeyValueTextInputFormat for input. What's weirder is that this exception occurs at different places from job to job. Sometimes it may be thrown at the 2nd iteration of my loop, while other times, it may be the 3rd, the 4th etc. Can someone explain to me what and why this is? Also, what would be the best way to test/debug a hadoop job?? Thanks. 2008-01-16 00:37:19,941 INFO org.apache.hadoop.mapred.ReduceTask: task_200801160024_0011_r_000000_1 Copying task_200801160024_0011_m_000000_0 output from ginkgo.mycluster.org 2008-01-16 00:37:19,953 INFO org.apache.hadoop.mapred.ReduceTask: task_200801160024_0011_r_000000_1 done copying task_200801160024_0011_m_000000_0 output from ginkgo.mycluster.org 2008-01-16 00:37:19,955 INFO org.apache.hadoop.mapred.ReduceTask: task_200801160024_0011_r_000000_1 Copying of all map outputs complete. Initiating the last merge on the remaining files in ramfs://mapoutput26453615 2008-01-16 00:37:20,088 WARN org.apache.hadoop.mapred.ReduceTask: task_200801160024_0011_r_000000_1 Final merge of the inmemory files threw an exception: java.io.IOException: java.io.IOException: wrong key class: class org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.Text at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2874) at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2683) at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2437) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1153) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:252) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1161) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:252) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) 2008-01-16 00:37:20,090 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: task_200801160024_0011_r_000000_1The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) -- -------------------------------------- Standing Bear Has Spoken --------------------------------------