Thanks for the explanation. My input format uses its own RecordReader
so it looks like I'll have to add compression support to it myself.

On Fri, Oct 8, 2010 at 2:34 PM, Tom White <t...@cloudera.com> wrote:
> It's done by the RecordReader. For text-based input formats, which use
> LineRecordReader, decompression is carried out automatically. For
> others it's not (e.g. sequence files which have internal compression).
> So it depends on what your custom input format does.
>
> Cheers,
> Tom
>
> On Fri, Oct 8, 2010 at 1:58 PM, Patrick Marchwiak <pmarchw...@gmail.com> 
> wrote:
>> Hi,
>> The Hadoop Definitive Guide book states that "if your input files are
>> compressed, they will be automatically decompressed as they are read
>> by MapReduce, using the filename extension to determine the codec to
>> use" (in the section titled "Using Compression in MapReduce"). I'm
>> trying to run a mapreduce job with some gzipped files as input and
>> this isn't working. Does support for this have to be built into the
>> input format? I'm using a custom one that extends from
>> FileInputFormat. Is there an additional configuration option that
>> should be set?  I'd like to avoid having to do decompression from
>> within my map.
>>
>> I'm using the new API and the CDH3b2 distro.
>>
>> Thanks.
>>
>

Reply via email to