Re: CompressionCodec in MapReduce

Grzegorz Gunia Wed, 11 Apr 2012 01:17:14 -0700

Thanks for you reply! That clears some thing up

There is but one problem... My CompressionCodec has to be instantiatedon a per-file basis, meaning it needs to know the name of the file it isto compress/decompress. I'm guessing that would not be possible with thecurrent implementation?


Or if so, how would I proceed with injecting it with the file name?
--
Greg

W dniu 2012-04-11 10:12, Zizon Qiu pisze:

append your custom codec full class name in "io.compression.codecs"either in mapred-site.xml or in the configuration object pass to Jobconstructor.
the map reduce framework will try to guess the compress algorithmusing the input files suffix.
if any CompressionCodec.getDefaultExtension() register in theconfiguration match the suffix,hadoop will try to instantiate thecodec and decompress for you ,if succeed,automatically.
the default value for "io.compression.codecs" is"org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec"
On Wed, Apr 11, 2012 at 3:55 PM, Grzegorz Gunia<sawt...@student.agh.edu.pl <mailto:sawt...@student.agh.edu.pl>> wrote:
    Hello,
    I am trying to apply a custom CompressionCodec to work with
    MapReduce jobs, but I haven't found a way to inject it during the
    reading of input data, or during the write of the job results.
    Am I missing something, or is there no support for compressed
    files in the filesystem?

    I am well aware of how to set it up to work during the intermitent
    phases of the MapReduce operation, but I just can't find a way to
    apply it BEFORE the job takes place...
    Is there any other way except simply uncompressing the files I
    need prior to scheduling a job?

    Huge thanks for any help you can give me!
    --
    Greg

Re: CompressionCodec in MapReduce

Reply via email to