Raymond,

LZO installation can be daunting even with the more recent
developments out there;

Most of this information is up at:

http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ

My quick guide: Installation for RedHat / Centos

- watch out for the various RPMs needed for lzo/2/devel support
- get the native libs in the hadoop/lib subdir from:
http://code.google.com/p/hadoop-gpl-compression/
- double check the permissions on these files; typically a set of "rw
rw r" permissions works well. also check the owner.
- get ant 1.8 to build the git repository if you are building any of the source
- move the lzo.jar into the hadoop/lib subdir


Changes to config: mapred-site.xml (add the following entries)

  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>

  <property>
    <name>mapred.child.env</name>
    <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native</value>
  </property>

  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>


Changes to Config: core-site.xml

Add these entries:

<property>
    <name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  </property>
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>



hadoop-env.sh

export HADOOP_CLASSPATH=/usr/lib/hadoop/lib/hadoop-lzo-0.4.3.jar
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 (or
the 64bit version)

Usage

for older (deprecated/undeprecated) API to use lzo files as input to a MR job:

conf.setInputFormat( DeprecatedLzoTextInputFormat.class );

Use "lzop" to compress the file

http://www.lzop.org/

To index the file for splitting on input:

In process locally:

hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.LzoIndexer big_file.lzo

On cluster, In MR:

hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.DistributedLzoIndexer
/hdfs/dir/big_file.lzo

To Compress the output of the entire job so that the output file in
hdfs is a LZO compressed file:

TextOutputFormat.setOutputCompressorClass(conf,
com.hadoop.compression.lzo.LzopCodec.class)
TextOutputFormat.setCompressOutput(conf, true);


Josh Patterson

Solutions Architect
Cloudera

On Thu, Jun 24, 2010 at 5:12 PM, Raymond Jennings III
<raymondj...@yahoo.com> wrote:
>
> Oh, maybe that's what I meant :-)  I recall reading something on this mail 
> group that "the compression" in not included with the hadoop binary and that 
> you have to get and install it separately due to license incompatibilities.  
> Looking at the config xml files it's not clear what I need to do.  Thanks.
>
>
>
> ----- Original Message ----
> From: Eric Sammer <esam...@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, June 24, 2010 5:09:33 PM
> Subject: Re: Newbie to HDFS compression
>
> There is no file system level compression in HDFS. You can stored
> compressed files in HDFS, however.
>
> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
> <raymondj...@yahoo.com> wrote:
> > Are there instructions on how to enable (which type?) of compression on 
> > hdfs?  Does this have to be done during installation or can it be added to 
> > a running cluster?
> >
> > Thanks,
> > Ray
> >
> >
> >
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>
>
>
>

Reply via email to