Re: Gzipped input files

Patrick Marchwiak Fri, 08 Oct 2010 16:58:37 -0700

Thanks for the explanation. My input format uses its own RecordReader
so it looks like I'll have to add compression support to it myself.


On Fri, Oct 8, 2010 at 2:34 PM, Tom White <t...@cloudera.com> wrote:
> It's done by the RecordReader. For text-based input formats, which use
> LineRecordReader, decompression is carried out automatically. For
> others it's not (e.g. sequence files which have internal compression).
> So it depends on what your custom input format does.
>
> Cheers,
> Tom
>
> On Fri, Oct 8, 2010 at 1:58 PM, Patrick Marchwiak <pmarchw...@gmail.com> 
> wrote:
>> Hi,
>> The Hadoop Definitive Guide book states that "if your input files are
>> compressed, they will be automatically decompressed as they are read
>> by MapReduce, using the filename extension to determine the codec to
>> use" (in the section titled "Using Compression in MapReduce"). I'm
>> trying to run a mapreduce job with some gzipped files as input and
>> this isn't working. Does support for this have to be built into the
>> input format? I'm using a custom one that extends from
>> FileInputFormat. Is there an additional configuration option that
>> should be set?  I'd like to avoid having to do decompression from
>> within my map.
>>
>> I'm using the new API and the CDH3b2 distro.
>>
>> Thanks.
>>
>

Re: Gzipped input files

Reply via email to