Hey Joaquin, When using SequenceFiles, use LzoCodec. The reason is that SequenceFile is a container format of its own, just like LZOP files are. It does not make sense combining the two.
For reading sequence files, use the SequenceFile.Reader class (http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/io/SequenceFile.Reader.html) and it will auto handle decompressing the K/V fields for you. You don't have to run lzop/etc. first to be able to read it, as the compression is applied internally and not over the entire file. Here is also a good link on the difference at Quora: http://www.quora.com/Whats-the-difference-between-the-LzoCodec-and-the-LzopCodec-in-Hadoop-LZO On Fri, Jun 15, 2012 at 11:34 AM, JOAQUIN GUANTER GONZALBEZ <x...@tid.es> wrote: > Hello, > > > > I have a sequence of MR Jobs that are using the SequenceFile for their > output and input format. If I run them without any compression enabled they > work fine. If I use the LzoCodec they also work just fine (but then the > output is not Lzop compatible which is inconvenient). > > > > If I try using the LzopCodec, then the first MR job (which reads from a > TextFile and outputs to a SequenceFile) runs OK, but when the second job > tries to read what the first job wrote, I get the following exception: > > > > java.io.EOFException: Premature EOF from inputStream > > at > com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75) > > at > com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114) > > at > com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54) > > at > com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83) > > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1591) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1493) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1480) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) > > at > org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50) > > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:451) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) > > at org.apache.ha > > > > Does anyone know why this could be happening? I’m using the latest’s > Couldera CDH3 distribution and I’m configuring the compression through the > mapred.output.compression.codec property in the mapred-site.xml file. > > > > Thanks! > > Ximo. > > > ________________________________ > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar > nuestra política de envío y recepción de correo electrónico en el enlace > situado más abajo. > This message is intended exclusively for its addressee. We only send and > receive email on the basis of the terms set out at > http://www.tid.es/ES/PAGINAS/disclaimer.aspx -- Harsh J