On Sun, Feb 26, 2012 at 9:09 AM, Harsh J <ha...@cloudera.com> wrote: > If you want to just quickly package the hadoop-lzo items instead of > building/managing-deployment on your own, you can reuse Todd Lipcon's > script at https://github.com/toddlipcon/hadoop-lzo-packager - Creates > both RPMs and DEBs. >
Thanks! Some questions I have is: 1. Would it work with sequence files? I am using SequenceFileAsTextInputStream 2. If I use SequenceFile.CompressionType.RECORD or BLOCK would it still split the files? 3. I am also using CDH's 20.2 version of hadoop. > > On Sun, Feb 26, 2012 at 9:55 PM, Ioan Eugen Stan <stan.ieu...@gmail.com> > wrote: > > 2012/2/26 Mohit Anchlia <mohitanch...@gmail.com>: > >> Thanks. Does it mean LZO is not installed by default? How can I install > LZO? > > > > The LZO library is released under GPL and I believe it can't be > > included in most distributions of Hadoop because of this (can't mix > > GPL with non GPL stuff). It should be easily available though. > > > >> On Sat, Feb 25, 2012 at 6:27 PM, Shi Yu <sh...@uchicago.edu> wrote: > >> > >>> Yes, it is supported by Hadoop sequence file. It is splittable > >>> by default. If you have installed and specified LZO correctly, > >>> use these: > >>> > >>> > >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma > >>> t.setCompressOutput(job,true); > >>> > >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma > >>> t.setOutputCompressorClass(job,com.hadoop.compression.lzo.LzoC > >>> odec.class); > >>> > >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma > >>> t.setOutputCompressionType(job, > >>> SequenceFile.CompressionType.BLOCK); > >>> > >>> job.setOutputFormatClass(org.apache.hadoop.mapreduce.lib.outpu > >>> t.SequenceFileOutputFormat.class); > >>> > >>> > >>> Shi > >>> > > > > > > > > -- > > Ioan Eugen Stan > > http://ieugen.blogspot.com/ > > > > -- > Harsh J >