Re: LZO with sequenceFile

Mohit Anchlia Sun, 26 Feb 2012 09:13:22 -0800

On Sun, Feb 26, 2012 at 9:09 AM, Harsh J <ha...@cloudera.com> wrote:

> If you want to just quickly package the hadoop-lzo items instead of
> building/managing-deployment on your own, you can reuse Todd Lipcon's
> script at https://github.com/toddlipcon/hadoop-lzo-packager - Creates
> both RPMs and DEBs.
>


Thanks! Some questions I have is:
1. Would it work with sequence files? I am using
SequenceFileAsTextInputStream
2. If I use SequenceFile.CompressionType.RECORD or BLOCK would it still
split the files?
3. I am also using CDH's 20.2 version of hadoop.


>
> On Sun, Feb 26, 2012 at 9:55 PM, Ioan Eugen Stan <stan.ieu...@gmail.com>
> wrote:
> > 2012/2/26 Mohit Anchlia <mohitanch...@gmail.com>:
> >> Thanks. Does it mean LZO is not installed by default? How can I install
> LZO?
> >
> > The LZO library is released under GPL and I believe it can't be
> > included in most distributions of Hadoop because of this (can't mix
> > GPL with non GPL stuff). It should be easily available though.
> >
> >> On Sat, Feb 25, 2012 at 6:27 PM, Shi Yu <sh...@uchicago.edu> wrote:
> >>
> >>> Yes, it is supported by Hadoop sequence file. It is splittable
> >>> by default. If you have installed and specified LZO correctly,
> >>> use these:
> >>>
> >>>
> >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma
> >>> t.setCompressOutput(job,true);
> >>>
> >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma
> >>> t.setOutputCompressorClass(job,com.hadoop.compression.lzo.LzoC
> >>> odec.class);
> >>>
> >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma
> >>> t.setOutputCompressionType(job,
> >>> SequenceFile.CompressionType.BLOCK);
> >>>
> >>> job.setOutputFormatClass(org.apache.hadoop.mapreduce.lib.outpu
> >>> t.SequenceFileOutputFormat.class);
> >>>
> >>>
> >>> Shi
> >>>
> >
> >
> >
> > --
> > Ioan Eugen Stan
> > http://ieugen.blogspot.com/
>
>
>
> --
> Harsh J
>

Re: LZO with sequenceFile

Reply via email to