Thanks for the explanation. My input format uses its own RecordReader so it looks like I'll have to add compression support to it myself.
On Fri, Oct 8, 2010 at 2:34 PM, Tom White <t...@cloudera.com> wrote: > It's done by the RecordReader. For text-based input formats, which use > LineRecordReader, decompression is carried out automatically. For > others it's not (e.g. sequence files which have internal compression). > So it depends on what your custom input format does. > > Cheers, > Tom > > On Fri, Oct 8, 2010 at 1:58 PM, Patrick Marchwiak <pmarchw...@gmail.com> > wrote: >> Hi, >> The Hadoop Definitive Guide book states that "if your input files are >> compressed, they will be automatically decompressed as they are read >> by MapReduce, using the filename extension to determine the codec to >> use" (in the section titled "Using Compression in MapReduce"). I'm >> trying to run a mapreduce job with some gzipped files as input and >> this isn't working. Does support for this have to be built into the >> input format? I'm using a custom one that extends from >> FileInputFormat. Is there an additional configuration option that >> should be set? I'd like to avoid having to do decompression from >> within my map. >> >> I'm using the new API and the CDH3b2 distro. >> >> Thanks. >> >