Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by RyanRawson:
http://wiki.apache.org/hadoop/UsingLzoCompression

New page:
== Warning ==

This doc only applies to 0.20.  If you are under 0.19.x, please consider 
upgrading.

== Why comprssion? ==

By enabling compression, the store file (HFile) will use a compression 
algorithm on blocks as they are written (during flushes and compactions) and 
thus must be decompressed when reading.

Since this adds a read-time-penalty, why would one enable any compression?  
There are a few reasons why the advantages of compression can outweigh the 
disadvantages:
* Compression reduces the number of bytes written to/read from HDFS
* Compression effectively improves the efficiency of network bandwidth and disk 
space
* Compression reduces the size of data needed to be read when issuing a read

To be as low friction as necessary, a real-time compression library is 
preferred.  Out of the box, HBase ships with only Gzip compression, which is 
fairly slow. 

To achieve maximal performance and benefit, you must enable LZO.

== Enabling Lzo compression in HBase ==

Lzo is a GPL'ed native-library that ships with most Linux distributions.  
However, to use it in HBase, one must do the following steps:

Ensure the native Lzo base library is available on every node:
* on Ubuntu: apt-get install liblzo2-dev
* or Download and build [http://www.oberhumer.com/opensource/lzo/]

Download/patch the native connector library:
* Download/checkout: [http://code.google.com/p/hadoop-gpl-compression/]
* Apply the patch attached to this issue: 
[http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=6]
* On Linux you may need to apply the patch: 
[http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=5]
* On Mac you may be interested in: 
[http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=7]
** Also you will probably have to add the line to build.xml just above the call 
to 'configure' in compile-native:
        <env key="CFLAGS" value="-arch x86_64" />

Build the native connector library:
* ant compile-native
* ant jar

Now you have the following results:
 build/hadoop-gpl-compression-0.1.0-dev.jar
 build/native/Linux-amd64-64/lib/libgplcompression.*

You might have Linux-i386-32 or Mac_OS_X-x86_64-64 or whatever platform you are 
actually using.

Copy the results into the hbase lib directory:
* build/hadoop-gpl-compression-0.1.0-dev.jar -> hbase/lib/
* build/native/Linux-amd64-64/lib/libgplcompression.* -> 
hbase/lib/native/Linux-amd-amd64-64/

Note there is an extra 'lib' level in the build, which is not present in the 
hbase/lib/native/ tree.

== Using Lzo ==

While creating tables in hbase shell, specify the per-column family compression 
flag:
 create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}

That's it!

Reply via email to