Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by RyanRawson: http://wiki.apache.org/hadoop/UsingLzoCompression ------------------------------------------------------------------------------ By enabling compression, the store file (HFile) will use a compression algorithm on blocks as they are written (during flushes and compactions) and thus must be decompressed when reading. Since this adds a read-time-penalty, why would one enable any compression? There are a few reasons why the advantages of compression can outweigh the disadvantages: - * Compression reduces the number of bytes written to/read from HDFS + * Compression reduces the number of bytes written to/read from HDFS - * Compression effectively improves the efficiency of network bandwidth and disk space + * Compression effectively improves the efficiency of network bandwidth and disk space - * Compression reduces the size of data needed to be read when issuing a read + * Compression reduces the size of data needed to be read when issuing a read To be as low friction as necessary, a real-time compression library is preferred. Out of the box, HBase ships with only Gzip compression, which is fairly slow. @@ -22, +22 @@ Lzo is a GPL'ed native-library that ships with most Linux distributions. However, to use it in HBase, one must do the following steps: Ensure the native Lzo base library is available on every node: - * on Ubuntu: apt-get install liblzo2-dev + * on Ubuntu: apt-get install liblzo2-dev - * or Download and build [http://www.oberhumer.com/opensource/lzo/] + * or Download and build [http://www.oberhumer.com/opensource/lzo/] Download/patch the native connector library: - * Download/checkout: [http://code.google.com/p/hadoop-gpl-compression/] + * Download/checkout: [http://code.google.com/p/hadoop-gpl-compression/] - * Apply the patch attached to this issue: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=6] + * Apply the patch attached to this issue: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=6] * On Linux you may need to apply the patch: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=5] * On Mac you may be interested in: [http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=7] ** Also you will probably have to add the line to build.xml just above the call to 'configure' in compile-native:
