I am currently trying to read in values where I have previously output a
Text,BytesWritable pair into . The key is actually Hadoop's Text writable, and
the value is a Protocol Buffer byte array output into a BytesWritable. Here is
a snippet showing the output configuration.
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, DefaultCodec.class);
SequenceFileAsBinaryOutputFormat.setOutputCompressionType(job,
SequenceFile.CompressionType.BLOCK);
SequenceFileAsBinaryOutputFormat.setSequenceFileOutputKeyClass(job, Text.class);
SequenceFileAsBinaryOutputFormat.setSequenceFileOutputValueClass(job,
BytesWritable.class);
FileOutputFormat.setOutputPath(job, outDataPath);
In a another job I am trying to read this back in:
job.setInputFormat(org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat.class);
public static class Map extends MapReduceBase implements
Mapper<Text,BytesWritable,Text,LongWritable> { ... }
I get an error like this:
java.io.IOException:
hdfs://localhost:4000/user/myuser/step1-out/part-00003.deflate not a
SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1458)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
at
org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat$SequenceFileAsBinaryRecordReader.<init>(SequenceFileAsBinaryInputFormat.java:67)
at
org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat.getRecordReader(SequenceFileAsBinaryInputFormat.java:48)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
Am I doing something wrong here? Or is there just some inherent problem with
what I am trying to do?
-Xavier