I am currently trying to read in values where I have previously output a 
Text,BytesWritable pair into .  The key is actually Hadoop's Text writable, and 
the value is a Protocol Buffer byte array output into a BytesWritable.  Here is 
a snippet showing the output configuration.

FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, DefaultCodec.class);
SequenceFileAsBinaryOutputFormat.setOutputCompressionType(job, 
SequenceFile.CompressionType.BLOCK);
SequenceFileAsBinaryOutputFormat.setSequenceFileOutputKeyClass(job, Text.class);
SequenceFileAsBinaryOutputFormat.setSequenceFileOutputValueClass(job, 
BytesWritable.class);
FileOutputFormat.setOutputPath(job, outDataPath);

In a another job I am trying to read this back in:

job.setInputFormat(org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat.class);

public static class Map extends MapReduceBase implements 
Mapper<Text,BytesWritable,Text,LongWritable> { ... }

I get an error like this:

java.io.IOException: 
hdfs://localhost:4000/user/myuser/step1-out/part-00003.deflate not a 
SequenceFile
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1458)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
        at 
org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat$SequenceFileAsBinaryRecordReader.<init>(SequenceFileAsBinaryInputFormat.java:67)
        at 
org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat.getRecordReader(SequenceFileAsBinaryInputFormat.java:48)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

Am I doing something wrong here?  Or is there just some inherent problem with 
what I am trying to do?


-Xavier 

Reply via email to