Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by RyanRawson: http://wiki.apache.org/hadoop/UsingLzoCompression New page: == Warning == This doc only applies to 0.20. If you are under 0.19.x, please consider upgrading. == Why comprssion? == By enabling compression, the store file (HFile) will use a compression algorithm on blocks as they are written (during flushes and compactions) and thus must be decompressed when reading. Since this adds a read-time-penalty, why would one enable any compression? There are a few reasons why the advantages of compression can outweigh the disadvantages: * Compression reduces the number of bytes written to/read from HDFS * Compression effectively improves the efficiency of network bandwidth and disk space * Compression reduces the size of data needed to be read when issuing a read To be as low friction as necessary, a real-time compression library is preferred. Out of the box, HBase ships with only Gzip compression, which is fairly slow. To achieve maximal performance and benefit, you must enable LZO. == Enabling Lzo compression in HBase == Lzo is a GPL'ed native-library that ships with most Linux distributions. However, to use it in HBase, one must do the following steps: Ensure the native Lzo base library is available on every node: * on Ubuntu: apt-get install liblzo2-dev * or Download and build [http://www.oberhumer.com/opensource/lzo/] Download/patch the native connector library: * Download/checkout: [http://code.google.com/p/hadoop-gpl-compression/] * Apply the patch attached to this issue: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=6] * On Linux you may need to apply the patch: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=5] * On Mac you may be interested in: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=7] ** Also you will probably have to add the line to build.xml just above the call to 'configure' in compile-native: <env key="CFLAGS" value="-arch x86_64" /> Build the native connector library: * ant compile-native * ant jar Now you have the following results: build/hadoop-gpl-compression-0.1.0-dev.jar build/native/Linux-amd64-64/lib/libgplcompression.* You might have Linux-i386-32 or Mac_OS_X-x86_64-64 or whatever platform you are actually using. Copy the results into the hbase lib directory: * build/hadoop-gpl-compression-0.1.0-dev.jar -> hbase/lib/ * build/native/Linux-amd64-64/lib/libgplcompression.* -> hbase/lib/native/Linux-amd-amd64-64/ Note there is an extra 'lib' level in the build, which is not present in the hbase/lib/native/ tree. == Using Lzo == While creating tables in hbase shell, specify the per-column family compression flag: create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'} That's it!
