Hi,
I am stuck again in a probably very simple problem. I couldn't generate
the map output in sequence file format. I always get this error:
java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not
class org.apache.hadoop.io.LongWritable
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:985)
at
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
edu.uchicago.naivetagger.lzocheck.MapperSequenceCompression$Map.map(MapperSequenceCompression.java:29)
at
edu.uchicago.naivetagger.lzocheck.MapperSequenceCompression$Map.map(MapperSequenceCompression.java:27)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Why hadoop always trying to cast the Text class of my key into
LongWritable? I notice that in old API there might be some problems
processing the index file and the sequence file in Mapper output. But I
am using the 0.20.2 API so I guess that is not the issue. I guess I
missed something naive here, but took me a long time figuring that out.
Thanks for any suggestion.
Here is my complete code. It contains a mapper only job and I ran it on
random input because it simply outputs a static <"key","a"> as <text,
text> output.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
public class MapperSequenceCompression extends Configured implements Tool{
public static class Map extends
org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value,
org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException,
InterruptedException{
context.write(new Text("key"), new Text("a"));
}
}
public int run(String[] args) throws IOException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,
args).getRemainingArgs();
Path in = new Path(args[0]);
Path out = new Path(args[1]);
Job job = new Job(conf, "MyJob");
job.setJarByClass(MapperSequenceCompression.class);
job.setMapperClass(MapperSequenceCompression.Map.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(job,
in);
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job,
out);
job.setNumReduceTasks(0);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.class);
try {
System.exit(job.waitForCompletion(true) ? 0 : 1);
return 0;
} catch( Throwable e) {
return -1;
}
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new
MapperSequenceCompression(),args);
System.exit(exitCode);
}
}