Hi,
The Hadoop Definitive Guide book states that "if your input files are
compressed, they will be automatically decompressed as they are read
by MapReduce, using the filename extension to determine the codec to
use" (in the section titled "Using Compression in MapReduce"). I'm
trying to run a mapreduce job with some gzipped files as input and
this isn't working. Does support for this have to be built into the
input format? I'm using a custom one that extends from
FileInputFormat. Is there an additional configuration option that
should be set?  I'd like to avoid having to do decompression from
within my map.

I'm using the new API and the CDH3b2 distro.

Thanks.

Reply via email to