Raymond, LZO installation can be daunting even with the more recent developments out there;
Most of this information is up at: http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ My quick guide: Installation for RedHat / Centos - watch out for the various RPMs needed for lzo/2/devel support - get the native libs in the hadoop/lib subdir from: http://code.google.com/p/hadoop-gpl-compression/ - double check the permissions on these files; typically a set of "rw rw r" permissions works well. also check the owner. - get ant 1.8 to build the git repository if you are building any of the source - move the lzo.jar into the hadoop/lib subdir Changes to config: mapred-site.xml (add the following entries) <property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.child.env</name> <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> Changes to Config: core-site.xml Add these entries: <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> hadoop-env.sh export HADOOP_CLASSPATH=/usr/lib/hadoop/lib/hadoop-lzo-0.4.3.jar export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 (or the 64bit version) Usage for older (deprecated/undeprecated) API to use lzo files as input to a MR job: conf.setInputFormat( DeprecatedLzoTextInputFormat.class ); Use "lzop" to compress the file http://www.lzop.org/ To index the file for splitting on input: In process locally: hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer big_file.lzo On cluster, In MR: hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer /hdfs/dir/big_file.lzo To Compress the output of the entire job so that the output file in hdfs is a LZO compressed file: TextOutputFormat.setOutputCompressorClass(conf, com.hadoop.compression.lzo.LzopCodec.class) TextOutputFormat.setCompressOutput(conf, true); Josh Patterson Solutions Architect Cloudera On Thu, Jun 24, 2010 at 5:12 PM, Raymond Jennings III <raymondj...@yahoo.com> wrote: > > Oh, maybe that's what I meant :-) I recall reading something on this mail > group that "the compression" in not included with the hadoop binary and that > you have to get and install it separately due to license incompatibilities. > Looking at the config xml files it's not clear what I need to do. Thanks. > > > > ----- Original Message ---- > From: Eric Sammer <esam...@cloudera.com> > To: common-user@hadoop.apache.org > Sent: Thu, June 24, 2010 5:09:33 PM > Subject: Re: Newbie to HDFS compression > > There is no file system level compression in HDFS. You can stored > compressed files in HDFS, however. > > On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III > <raymondj...@yahoo.com> wrote: > > Are there instructions on how to enable (which type?) of compression on > > hdfs? Does this have to be done during installation or can it be added to > > a running cluster? > > > > Thanks, > > Ray > > > > > > > > > > > > -- > Eric Sammer > twitter: esammer > data: www.cloudera.com > > > >