Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "UsingLzoCompression" page has been changed by TedYu.
http://wiki.apache.org/hadoop/UsingLzoCompression?action=diff&rev1=20&rev2=21

--------------------------------------------------

  == Warning ==
- 
  This doc only applies to 0.20 and beyond.  If you are under 0.19.x, please 
consider upgrading.
  
+ This distro doesn't contain all bug fixes (such as when LZO header or block 
header data falls on read boundary).
+ 
+ Please get latest from http://github.com/kevinweil/hadoop-lzo
+ 
  == Why compression? ==
- 
  By enabling compression, the store file (HFile) will use a compression 
algorithm on blocks as they are written (during flushes and compactions) and 
thus must be decompressed when reading.
  
  Since this adds a read-time-penalty, why would one enable any compression?  
There are a few reasons why the advantages of compression can outweigh the 
disadvantages:
+ 
   * Compression reduces the number of bytes written to/read from HDFS
   * Compression effectively improves the efficiency of network bandwidth and 
disk space
   * Compression reduces the size of data needed to be read when issuing a read
  
- To be as low friction as necessary, a real-time compression library is 
preferred.  Out of the box, HBase ships with only Gzip compression, which is 
fairly slow. 
+ To be as low friction as necessary, a real-time compression library is 
preferred.  Out of the box, HBase ships with only Gzip compression, which is 
fairly slow.
  
  To achieve maximal performance and benefit, you must enable LZO.
  
  == Enabling Lzo compression in HBase ==
- 
  Lzo is a GPL'ed native-library that ships with most Linux distributions.  
However, to use it in HBase, one must do the following steps:
  
  Ensure the native Lzo base library is available on every node:
+ 
   * on Ubuntu: apt-get install liblzo2-dev
-  * or Download and build [[http://www.oberhumer.com/opensource/lzo/]]
+  * or Download and build http://www.oberhumer.com/opensource/lzo/
  
  Checkout the native connector library:
+ 
-  * The project is [[http://code.google.com/p/hadoop-gpl-compression/]] 
+  * The project is http://code.google.com/p/hadoop-gpl-compression/
   * For 0.20.2 checkout branches/branch-0.1
   * For 0.21 or 0.22 checkout trunk
  
  On Mac:
+ 
   * To install the hadoop-gpl-compression library on a mac, it is advisable to 
use MacPorts. To do so you must do the following:
+ 
  (Parts of this found on 
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ )
  
  {{{
  > port fetch lzo2 # If for some reason LZO2 is already installed, please 
uninstall first before doing this
  > port edit lzo2 # A vim editor should open
  
- // Add the following block of text in the file and save the file. 
+ // Add the following block of text in the file and save the file.
  variant x86_64 description "Build the 64-bit." {
      configure.args-delete     --build=x86-apple-darwin ABI=standard
      configure.cflags-delete   -m32
@@ -53, +59 @@

  
  > port install lzo2 +x86_64
  }}}
+ This ensures the library is built in 64 bit mode, because java 1.6 is 64 bit 
only.  Also to make sure your lzo library is x64_64 as well, type:
  
- This ensures the library is built in 64 bit mode, because java 1.6 is 64 bit 
only.  Also to make sure your lzo library is x64_64 as well, type: {{{
+ {{{
  $ file /usr/local/lib/liblzo2.2.0.0.dylib
  /usr/local/lib/liblzo2.2.0.0.dylib: Mach-O 64-bit dynamically linked shared 
library x86_64
  }}}
+  * On Mac you might want to use a command line, in the hadoop-gpl-compression 
home directory:
  
- 
-  * On Mac you might want to use a command line, in the hadoop-gpl-compression 
home directory:
  {{{
  env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ \
  C_INCLUDE_PATH=/path/to/lzo64/include LIBRARY_PATH=/path/to/lzo64/lib \
  CFLAGS="-arch x86_64" ant clean compile-native test tar
  }}}
+ * Note: If you used macports, /path/to/lzo64 will be replaced by /opt/local  
(e.g. /opt/local/include and /opt/local/lib ) * Note: If for some reason you 
are getting compilation errors, you can add the following to the environment 
variables:
- 
- * Note: If you used macports, /path/to/lzo64 will be replaced by /opt/local  
(e.g. /opt/local/include and /opt/local/lib )
- * Note: If for some reason you are getting compilation errors, you can add 
the following to the environment variables: 
  
  {{{
  CLASSPATH=$HADOOP_HOME/hadoop-<version>-core.jar
  }}}
- 
  * Note: Also during this install, if you are running into permission denied 
errors, even as ROOT, you can go ahead and change permissions of those files in 
order for the build to complete
  
  Once the install has completed, a jar file and lib files have been created in 
the HADOOP-GPL-HOME/build directory.  All these files MUST be copied both into 
your HADOOP_HOME and HBASE_HOME directories using the following commands from 
the HADOOP-GPL-HOME directory:
@@ -84, +87 @@

  > tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar 
-xBvf - -C $HADOOP_HOME/lib/native
  > tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar 
-xBvf - -C $HBASE_HOME/lib/native
  }}}
+ To build lzo2 from source in 64 bit mode:
  
+ {{{
- 
- 
- To build lzo2 from source in 64 bit mode: {{{
  $ CFLAGS="-arch x86_64" ./configure --build=x86_64-darwin --enable-shared 
--disable-asm
  <configure output>
- $ make 
+ $ make
  $ sudo make install
  }}}
- 
  On Linux:
  
-   * On Linux (with gcc compiler), to compile for 64-bit machine 
+  * On Linux (with gcc compiler), to compile for 64-bit machine
+ 
  {{{
  $ export CFLAGS="-m64"
  }}}
+ Build the native connector library:
  
- Build the native connector library: {{{
+ {{{
  $ ant compile-native
  $ ant jar
  }}}
- 
  On Mac, the resulting library should be x86_64, as above, if not, add in the 
extra CFLAGS to build.xml in the call to configure in the target compile-native 
as listed above.
  
  Now you have the following results:
+ 
  {{{
   build/hadoop-gpl-compression-0.1.0-dev.jar
   build/native/Linux-amd64-64/lib/libgplcompression.*
  }}}
- 
  You might have Linux-i386-32 or Mac_OS_X-x86_64-64 or whatever platform you 
are actually using.
  
- Copy the results into the hbase lib directory:{{{
+ Copy the results into the hbase lib directory:
+ 
+ {{{
  $ cp build/hadoop-gpl-compression-0.1.0-dev.jar hbase/lib/
  $ cp build/native/Linux-amd64-64/lib/libgplcompression.* 
hbase/lib/native/Linux-amd64-64/
  }}}
- 
  Note there is an extra 'lib' level in the build, which is not present in the 
hbase/lib/native/ tree.
  
- (VERY IMPORTANT)
- Distribute the new files to every machine in your cluster.
+ (VERY IMPORTANT) Distribute the new files to every machine in your cluster.
- 
- 
  
  == Using Lzo ==
+ While creating tables in hbase shell, specify the per-column family 
compression flag:
  
- While creating tables in hbase shell, specify the per-column family 
compression flag:
  {{{
   create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}
  }}}
- 
  That's it!
  
  == Testing Compression is enabled ==
+ One more thing, to test compression is properly enabled, run: {{{./bin/hbase 
org.apache.hadoop.hbase.util.CompressionTest}}} (Above presumes at least hbase 
0.20.1) Above will dump out usage on how to run the CompressionTest.  Be sure 
to run on all nodes in your cluster to ensure compression is working on all.
- One more thing, to test compression is properly enabled, run:
- {{{./bin/hbase org.apache.hadoop.hbase.util.CompressionTest}}}
- (Above presumes at least hbase 0.20.1)
- Above will dump out usage on how to run the CompressionTest.  Be sure to run 
on all nodes in your cluster to ensure compression is working on all.
  
  == Other tools ==
- 
  Does this help?  Todd Lipcons' 
[[http://github.com/toddlipcon/hadoop-lzo-packager|hadoop-lzo-packager]]
  
  == Troubleshooting ==
- 
  If you get ''com.hadoop.compression.lzo.LzoCompressor: 
java.lang.UnsatisfiedLinkError'',  check that 64 bit lzo libraries are being 
installed in /usr/lib rather than /usr/lib64.  Even though a standalone java 
application to load up the lzo library could see it in /usr/lib, running 
hadoop/hbase it wouldn't take.  Just copy the liblzo files over and make the 
appropriate links (From Samuel Yu up on the mailing list)
  

Reply via email to