http://git-wip-us.apache.org/repos/asf/hbase/blob/a1fe1e09/src/main/docbkx/compression.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/compression.xml b/src/main/docbkx/compression.xml new file mode 100644 index 0000000..d1971b1 --- /dev/null +++ b/src/main/docbkx/compression.xml @@ -0,0 +1,535 @@ +<?xml version="1.0" encoding="UTF-8"?> +<appendix + xml:id="compression" + version="5.0" + xmlns="http://docbook.org/ns/docbook" + xmlns:xlink="http://www.w3.org/1999/xlink" + xmlns:xi="http://www.w3.org/2001/XInclude" + xmlns:svg="http://www.w3.org/2000/svg" + xmlns:m="http://www.w3.org/1998/Math/MathML" + xmlns:html="http://www.w3.org/1999/xhtml" + xmlns:db="http://docbook.org/ns/docbook"> + <!--/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +--> + + <title>Compression and Data Block Encoding In + HBase<indexterm><primary>Compression</primary><secondary>Data Block + Encoding</secondary><seealso>codecs</seealso></indexterm></title> + <note> + <para>Codecs mentioned in this section are for encoding and decoding data blocks or row keys. + For information about replication codecs, see <xref + linkend="cluster.replication.preserving.tags" />.</para> + </note> + <para>Some of the information in this section is pulled from a <link + xlink:href="http://search-hadoop.com/m/lL12B1PFVhp1/v=threaded">discussion</link> on the + HBase Development mailing list.</para> + <para>HBase supports several different compression algorithms which can be enabled on a + ColumnFamily. Data block encoding attempts to limit duplication of information in keys, taking + advantage of some of the fundamental designs and patterns of HBase, such as sorted row keys + and the schema of a given table. Compressors reduce the size of large, opaque byte arrays in + cells, and can significantly reduce the storage space needed to store uncompressed + data.</para> + <para>Compressors and data block encoding can be used together on the same ColumnFamily.</para> + + <formalpara> + <title>Changes Take Effect Upon Compaction</title> + <para>If you change compression or encoding for a ColumnFamily, the changes take effect during + compaction.</para> + </formalpara> + + <para>Some codecs take advantage of capabilities built into Java, such as GZip compression. + Others rely on native libraries. Native libraries may be available as part of Hadoop, such as + LZ4. In this case, HBase only needs access to the appropriate shared library. Other codecs, + such as Google Snappy, need to be installed first. Some codecs are licensed in ways that + conflict with HBase's license and cannot be shipped as part of HBase.</para> + + <para>This section discusses common codecs that are used and tested with HBase. No matter what + codec you use, be sure to test that it is installed correctly and is available on all nodes in + your cluster. Extra operational steps may be necessary to be sure that codecs are available on + newly-deployed nodes. You can use the <xref + linkend="compression.test" /> utility to check that a given codec is correctly + installed.</para> + + <para>To configure HBase to use a compressor, see <xref + linkend="compressor.install" />. To enable a compressor for a ColumnFamily, see <xref + linkend="changing.compression" />. To enable data block encoding for a ColumnFamily, see + <xref linkend="data.block.encoding.enable" />.</para> + <itemizedlist> + <title>Block Compressors</title> + <listitem> + <para>none</para> + </listitem> + <listitem> + <para>Snappy</para> + </listitem> + <listitem> + <para>LZO</para> + </listitem> + <listitem> + <para>LZ4</para> + </listitem> + <listitem> + <para>GZ</para> + </listitem> + </itemizedlist> + + + <itemizedlist xml:id="data.block.encoding.types"> + <title>Data Block Encoding Types</title> + <listitem> + <para>Prefix - Often, keys are very similar. Specifically, keys often share a common prefix + and only differ near the end. For instance, one key might be + <literal>RowKey:Family:Qualifier0</literal> and the next key might be + <literal>RowKey:Family:Qualifier1</literal>. In Prefix encoding, an extra column is + added which holds the length of the prefix shared between the current key and the previous + key. Assuming the first key here is totally different from the key before, its prefix + length is 0. The second key's prefix length is <literal>23</literal>, since they have the + first 23 characters in common.</para> + <para>Obviously if the keys tend to have nothing in common, Prefix will not provide much + benefit.</para> + <para>The following image shows a hypothetical ColumnFamily with no data block encoding.</para> + <figure> + <title>ColumnFamily with No Encoding</title> + <mediaobject> + <imageobject> + <imagedata fileref="data_block_no_encoding.png" width="800"/> + </imageobject> + <caption><para>A ColumnFamily with no encoding></para></caption> + </mediaobject> + </figure> + <para>Here is the same data with prefix data encoding.</para> + <figure> + <title>ColumnFamily with Prefix Encoding</title> + <mediaobject> + <imageobject> + <imagedata fileref="data_block_prefix_encoding.png" width="800"/> + </imageobject> + <caption><para>A ColumnFamily with prefix encoding</para></caption> + </mediaobject> + </figure> + </listitem> + <listitem> + <para>Diff - Diff encoding expands upon Prefix encoding. Instead of considering the key + sequentially as a monolithic series of bytes, each key field is split so that each part of + the key can be compressed more efficiently. Two new fields are added: timestamp and type. + If the ColumnFamily is the same as the previous row, it is omitted from the current row. + If the key length, value length or type are the same as the previous row, the field is + omitted. In addition, for increased compression, the timestamp is stored as a Diff from + the previous row's timestamp, rather than being stored in full. Given the two row keys in + the Prefix example, and given an exact match on timestamp and the same type, neither the + value length, or type needs to be stored for the second row, and the timestamp value for + the second row is just 0, rather than a full timestamp.</para> + <para>Diff encoding is disabled by default because writing and scanning are slower but more + data is cached.</para> + <para>This image shows the same ColumnFamily from the previous images, with Diff encoding.</para> + <figure> + <title>ColumnFamily with Diff Encoding</title> + <mediaobject> + <imageobject> + <imagedata fileref="data_block_diff_encoding.png" width="800"/> + </imageobject> + <caption><para>A ColumnFamily with diff encoding</para></caption> + </mediaobject> + </figure> + </listitem> + <listitem> + <para>Fast Diff - Fast Diff works similar to Diff, but uses a faster implementation. It also + adds another field which stores a single bit to track whether the data itself is the same + as the previous row. If it is, the data is not stored again. Fast Diff is the recommended + codec to use if you have long keys or many columns. The data format is nearly identical to + Diff encoding, so there is not an image to illustrate it.</para> + </listitem> + <listitem> + <para>Prefix Tree encoding was introduced as an experimental feature in HBase 0.96. It + provides similar memory savings to the Prefix, Diff, and Fast Diff encoder, but provides + faster random access at a cost of slower encoding speed. Prefix Tree may be appropriate + for applications that have high block cache hit ratios. It introduces new 'tree' fields + for the row and column. The row tree field contains a list of offsets/references + corresponding to the cells in that row. This allows for a good deal of compression. For + more details about Prefix Tree encoding, see <link + xlink:href="https://issues.apache.org/jira/browse/HBASE-4676">HBASE-4676</link>. It is + difficult to graphically illustrate a prefix tree, so no image is included. See the + Wikipedia article for <link + xlink:href="http://en.wikipedia.org/wiki/Trie">Trie</link> for more general information + about this data structure.</para> + </listitem> + </itemizedlist> + + <section> + <title>Which Compressor or Data Block Encoder To Use</title> + <para>The compression or codec type to use depends on the characteristics of your data. + Choosing the wrong type could cause your data to take more space rather than less, and can + have performance implications. In general, you need to weigh your options between smaller + size and faster compression/decompression. Following are some general guidelines, expanded from a discussion at <link xlink:href="http://search-hadoop.com/m/lL12B1PFVhp1">Documenting Guidance on compression and codecs</link>. </para> + <itemizedlist> + <listitem> + <para>If you have long keys (compared to the values) or many columns, use a prefix + encoder. FAST_DIFF is recommended, as more testing is needed for Prefix Tree + encoding.</para> + </listitem> + <listitem> + <para>If the values are large (and not precompressed, such as images), use a data block + compressor.</para> + </listitem> + <listitem> + <para>Use GZIP for <firstterm>cold data</firstterm>, which is accessed infrequently. GZIP + compression uses more CPU resources than Snappy or LZO, but provides a higher + compression ratio.</para> + </listitem> + <listitem> + <para>Use Snappy or LZO for <firstterm>hot data</firstterm>, which is accessed + frequently. Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high + of a compression ratio.</para> + </listitem> + <listitem> + <para>In most cases, enabling Snappy or LZO by default is a good choice, because they have + a low performance overhead and provide space savings.</para> + </listitem> + <listitem> + <para>Before Snappy became available by Google in 2011, LZO was the default. Snappy has + similar qualities as LZO but has been shown to perform better.</para> + </listitem> + </itemizedlist> + </section> + <section xml:id="hadoop.native.lib"> + <title>Making use of Hadoop Native Libraries in HBase</title> + <para>The Hadoop shared library has a bunch of facility including + compression libraries and fast crc'ing. To make this facility available + to HBase, do the following. HBase/Hadoop will fall back to use + alternatives if it cannot find the native library versions -- or + fail outright if you asking for an explicit compressor and there is + no alternative available.</para> + <para>If you see the following in your HBase logs, you know that HBase was unable + to locate the Hadoop native libraries: + <programlisting>2014-08-07 09:26:20,139 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable</programlisting> + If the libraries loaded successfully, the WARN message does not show. + </para> + <para>Lets presume your Hadoop shipped with a native library that + suits the platform you are running HBase on. To check if the Hadoop + native library is available to HBase, run the following tool (available in + Hadoop 2.1 and greater): + <programlisting>$ ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker +2014-08-26 13:15:38,717 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable +Native library checking: +hadoop: false +zlib: false +snappy: false +lz4: false +bzip2: false +2014-08-26 13:15:38,863 INFO [main] util.ExitUtil: Exiting with status 1</programlisting> +Above shows that the native hadoop library is not available in HBase context. + </para> + <para>To fix the above, either copy the Hadoop native libraries local or symlink to + them if the Hadoop and HBase stalls are adjacent in the filesystem. + You could also point at their location by setting the <varname>LD_LIBRARY_PATH</varname> environment + variable.</para> + <para>Where the JVM looks to find native librarys is "system dependent" + (See <classname>java.lang.System#loadLibrary(name)</classname>). On linux, by default, + is going to look in <filename>lib/native/PLATFORM</filename> where <varname>PLATFORM</varname> + is the label for the platform your HBase is installed on. + On a local linux machine, it seems to be the concatenation of the java properties + <varname>os.name</varname> and <varname>os.arch</varname> followed by whether 32 or 64 bit. + HBase on startup prints out all of the java system properties so find the os.name and os.arch + in the log. For example: + <programlisting>.... + 2014-08-06 15:27:22,853 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux + 2014-08-06 15:27:22,853 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64 + ... + </programlisting> + So in this case, the PLATFORM string is <varname>Linux-amd64-64</varname>. + Copying the Hadoop native libraries or symlinking at <filename>lib/native/Linux-amd64-64</filename> + will ensure they are found. Check with the Hadoop <filename>NativeLibraryChecker</filename>. + </para> + + <para>Here is example of how to point at the Hadoop libs with <varname>LD_LIBRARY_PATH</varname> + environment variable: + <programlisting>$ LD_LIBRARY_PATH=~/hadoop-2.5.0-SNAPSHOT/lib/native ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker +2014-08-26 13:42:49,332 INFO [main] bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native +2014-08-26 13:42:49,337 INFO [main] zlib.ZlibFactory: Successfully loaded & initialized native-zlib library +Native library checking: +hadoop: true /home/stack/hadoop-2.5.0-SNAPSHOT/lib/native/libhadoop.so.1.0.0 +zlib: true /lib64/libz.so.1 +snappy: true /usr/lib64/libsnappy.so.1 +lz4: true revision:99 +bzip2: true /lib64/libbz2.so.1</programlisting> +Set in <filename>hbase-env.sh</filename> the LD_LIBRARY_PATH environment variable when starting your HBase. + </para> + </section> + + <section> + <title>Compressor Configuration, Installation, and Use</title> + <section + xml:id="compressor.install"> + <title>Configure HBase For Compressors</title> + <para>Before HBase can use a given compressor, its libraries need to be available. Due to + licensing issues, only GZ compression is available to HBase (via native Java libraries) in + a default installation. Other compression libraries are available via the shared library + bundled with your hadoop. The hadoop native library needs to be findable when HBase + starts. See </para> + <section> + <title>Compressor Support On the Master</title> + <para>A new configuration setting was introduced in HBase 0.95, to check the Master to + determine which data block encoders are installed and configured on it, and assume that + the entire cluster is configured the same. This option, + <code>hbase.master.check.compression</code>, defaults to <literal>true</literal>. This + prevents the situation described in <link + xlink:href="https://issues.apache.org/jira/browse/HBASE-6370">HBASE-6370</link>, where + a table is created or modified to support a codec that a region server does not support, + leading to failures that take a long time to occur and are difficult to debug. </para> + <para>If <code>hbase.master.check.compression</code> is enabled, libraries for all desired + compressors need to be installed and configured on the Master, even if the Master does + not run a region server.</para> + </section> + <section> + <title>Install GZ Support Via Native Libraries</title> + <para>HBase uses Java's built-in GZip support unless the native Hadoop libraries are + available on the CLASSPATH. The recommended way to add libraries to the CLASSPATH is to + set the environment variable <envar>HBASE_LIBRARY_PATH</envar> for the user running + HBase. If native libraries are not available and Java's GZIP is used, <literal>Got + brand-new compressor</literal> reports will be present in the logs. See <xref + linkend="brand.new.compressor" />).</para> + </section> + <section + xml:id="lzo.compression"> + <title>Install LZO Support</title> + <para>HBase cannot ship with LZO because of incompatibility between HBase, which uses an + Apache Software License (ASL) and LZO, which uses a GPL license. See the <link + xlink:href="http://wiki.apache.org/hadoop/UsingLzoCompression">Using LZO + Compression</link> wiki page for information on configuring LZO support for HBase. </para> + <para>If you depend upon LZO compression, consider configuring your RegionServers to fail + to start if LZO is not available. See <xref + linkend="hbase.regionserver.codecs" />.</para> + </section> + <section + xml:id="lz4.compression"> + <title>Configure LZ4 Support</title> + <para>LZ4 support is bundled with Hadoop. Make sure the hadoop shared library + (libhadoop.so) is accessible when you start + HBase. After configuring your platform (see <xref + linkend="hbase.native.platform" />), you can make a symbolic link from HBase to the native Hadoop + libraries. This assumes the two software installs are colocated. For example, if my + 'platform' is Linux-amd64-64: + <programlisting language="bourne">$ cd $HBASE_HOME +$ mkdir lib/native +$ ln -s $HADOOP_HOME/lib/native lib/native/Linux-amd64-64</programlisting> + Use the compression tool to check that LZ4 is installed on all nodes. Start up (or restart) + HBase. Afterward, you can create and alter tables to enable LZ4 as a + compression codec.: + <screen> +hbase(main):003:0> <userinput>alter 'TestTable', {NAME => 'info', COMPRESSION => 'LZ4'}</userinput> + </screen> + </para> + </section> + <section + xml:id="snappy.compression.installation"> + <title>Install Snappy Support</title> + <para>HBase does not ship with Snappy support because of licensing issues. You can install + Snappy binaries (for instance, by using <command>yum install snappy</command> on CentOS) + or build Snappy from source. After installing Snappy, search for the shared library, + which will be called <filename>libsnappy.so.X</filename> where X is a number. If you + built from source, copy the shared library to a known location on your system, such as + <filename>/opt/snappy/lib/</filename>.</para> + <para>In addition to the Snappy library, HBase also needs access to the Hadoop shared + library, which will be called something like <filename>libhadoop.so.X.Y</filename>, + where X and Y are both numbers. Make note of the location of the Hadoop library, or copy + it to the same location as the Snappy library.</para> + <note> + <para>The Snappy and Hadoop libraries need to be available on each node of your cluster. + See <xref + linkend="compression.test" /> to find out how to test that this is the case.</para> + <para>See <xref + linkend="hbase.regionserver.codecs" /> to configure your RegionServers to fail to + start if a given compressor is not available.</para> + </note> + <para>Each of these library locations need to be added to the environment variable + <envar>HBASE_LIBRARY_PATH</envar> for the operating system user that runs HBase. You + need to restart the RegionServer for the changes to take effect.</para> + </section> + + + <section + xml:id="compression.test"> + <title>CompressionTest</title> + <para>You can use the CompressionTest tool to verify that your compressor is available to + HBase:</para> + <screen language="bourne"> + $ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://<replaceable>host/path/to/hbase</replaceable> snappy + </screen> + </section> + + + <section + xml:id="hbase.regionserver.codecs"> + <title>Enforce Compression Settings On a RegionServer</title> + <para>You can configure a RegionServer so that it will fail to restart if compression is + configured incorrectly, by adding the option hbase.regionserver.codecs to the + <filename>hbase-site.xml</filename>, and setting its value to a comma-separated list + of codecs that need to be available. For example, if you set this property to + <literal>lzo,gz</literal>, the RegionServer would fail to start if both compressors + were not available. This would prevent a new server from being added to the cluster + without having codecs configured properly.</para> + </section> + </section> + + <section + xml:id="changing.compression"> + <title>Enable Compression On a ColumnFamily</title> + <para>To enable compression for a ColumnFamily, use an <code>alter</code> command. You do + not need to re-create the table or copy data. If you are changing codecs, be sure the old + codec is still available until all the old StoreFiles have been compacted.</para> + <example> + <title>Enabling Compression on a ColumnFamily of an Existing Table using HBase + Shell</title> + <screen><![CDATA[ +hbase> disable 'test' +hbase> alter 'test', {NAME => 'cf', COMPRESSION => 'GZ'} +hbase> enable 'test']]> + </screen> + </example> + <example> + <title>Creating a New Table with Compression On a ColumnFamily</title> + <screen><![CDATA[ +hbase> create 'test2', { NAME => 'cf2', COMPRESSION => 'SNAPPY' } + ]]></screen> + </example> + <example> + <title>Verifying a ColumnFamily's Compression Settings</title> + <screen><![CDATA[ +hbase> describe 'test' +DESCRIPTION ENABLED + 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE false + ', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', + VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS + => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'fa + lse', BLOCKSIZE => '65536', IN_MEMORY => 'false', B + LOCKCACHE => 'true'} +1 row(s) in 0.1070 seconds + ]]></screen> + </example> + </section> + + <section> + <title>Testing Compression Performance</title> + <para>HBase includes a tool called LoadTestTool which provides mechanisms to test your + compression performance. You must specify either <literal>-write</literal> or + <literal>-update-read</literal> as your first parameter, and if you do not specify another + parameter, usage advice is printed for each option.</para> + <example> + <title><command>LoadTestTool</command> Usage</title> + <screen language="bourne"><![CDATA[ +$ bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -h +usage: bin/hbase org.apache.hadoop.hbase.util.LoadTestTool <options> +Options: + -batchupdate Whether to use batch as opposed to separate + updates for every column in a row + -bloom <arg> Bloom filter type, one of [NONE, ROW, ROWCOL] + -compression <arg> Compression type, one of [LZO, GZ, NONE, SNAPPY, + LZ4] + -data_block_encoding <arg> Encoding algorithm (e.g. prefix compression) to + use for data blocks in the test column family, one + of [NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE]. + -encryption <arg> Enables transparent encryption on the test table, + one of [AES] + -generator <arg> The class which generates load for the tool. Any + args for this class can be passed as colon + separated after class name + -h,--help Show usage + -in_memory Tries to keep the HFiles of the CF inmemory as far + as possible. Not guaranteed that reads are always + served from inmemory + -init_only Initialize the test table only, don't do any + loading + -key_window <arg> The 'key window' to maintain between reads and + writes for concurrent write/read workload. The + default is 0. + -max_read_errors <arg> The maximum number of read errors to tolerate + before terminating all reader threads. The default + is 10. + -multiput Whether to use multi-puts as opposed to separate + puts for every column in a row + -num_keys <arg> The number of keys to read/write + -num_tables <arg> A positive integer number. When a number n is + speicfied, load test tool will load n table + parallely. -tn parameter value becomes table name + prefix. Each table name is in format + <tn>_1...<tn>_n + -read <arg> <verify_percent>[:<#threads=20>] + -regions_per_server <arg> A positive integer number. When a number n is + specified, load test tool will create the test + table with n regions per server + -skip_init Skip the initialization; assume test table already + exists + -start_key <arg> The first key to read/write (a 0-based index). The + default value is 0. + -tn <arg> The name of the table to read or write + -update <arg> <update_percent>[:<#threads=20>][:<#whether to + ignore nonce collisions=0>] + -write <arg> <avg_cols_per_key>:<avg_data_size>[:<#threads=20>] + -zk <arg> ZK quorum as comma-separated host names without + port numbers + -zk_root <arg> name of parent znode in zookeeper + ]]></screen> + </example> + <example> + <title>Example Usage of LoadTestTool</title> + <screen language="bourne"> +$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys 1000000 + -read 100:30 -num_tables 1 -data_block_encoding NONE -tn load_test_tool_NONE + </screen> + </example> + </section> + </section> + + <section xml:id="data.block.encoding.enable"> + <title>Enable Data Block Encoding</title> + <para>Codecs are built into HBase so no extra configuration is needed. Codecs are enabled on a + table by setting the <code>DATA_BLOCK_ENCODING</code> property. Disable the table before + altering its DATA_BLOCK_ENCODING setting. Following is an example using HBase Shell:</para> + <example> + <title>Enable Data Block Encoding On a Table</title> + <screen><![CDATA[ +hbase> disable 'test' +hbase> alter 'test', { NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF' } +Updating all regions with the new schema... +0/1 regions updated. +1/1 regions updated. +Done. +0 row(s) in 2.2820 seconds +hbase> enable 'test' +0 row(s) in 0.1580 seconds + ]]></screen> + </example> + <example> + <title>Verifying a ColumnFamily's Data Block Encoding</title> + <screen><![CDATA[ +hbase> describe 'test' +DESCRIPTION ENABLED + 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST true + _DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => + '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERS + IONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS = + > 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fals + e', BLOCKCACHE => 'true'} +1 row(s) in 0.0650 seconds + ]]></screen> + </example> + </section> + + +</appendix>
http://git-wip-us.apache.org/repos/asf/hbase/blob/a1fe1e09/src/main/docbkx/configuration.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml index 74b8e52..a0b7d11 100644 --- a/src/main/docbkx/configuration.xml +++ b/src/main/docbkx/configuration.xml @@ -925,8 +925,8 @@ stopping hbase...............</screen> <!--presumes the pre-site target has put the hbase-default.xml at this location--> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" - href="../../../target/docbkx/hbase-default.xml"> - <xi:fallback> + href="hbase-default.xml"> + <!--<xi:fallback> <section xml:id="hbase_default_configurations"> <title /> @@ -1007,7 +1007,7 @@ stopping hbase...............</screen> </section> </section> </section> - </xi:fallback> + </xi:fallback>--> </xi:include> </section> http://git-wip-us.apache.org/repos/asf/hbase/blob/a1fe1e09/src/main/docbkx/customization-pdf.xsl ---------------------------------------------------------------------- diff --git a/src/main/docbkx/customization-pdf.xsl b/src/main/docbkx/customization-pdf.xsl new file mode 100644 index 0000000..b21236f --- /dev/null +++ b/src/main/docbkx/customization-pdf.xsl @@ -0,0 +1,129 @@ +<?xml version="1.0"?> +<xsl:stylesheet + xmlns:xsl="http://www.w3.org/1999/XSL/Transform" + version="1.0"> +<!-- +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +--> + <xsl:import href="urn:docbkx:stylesheet/docbook.xsl"/> + <xsl:import href="urn:docbkx:stylesheet/highlight.xsl"/> + + + <!--################################################### + Paper & Page Size + ################################################### --> + + <!-- Paper type, no headers on blank pages, no double sided printing --> + <xsl:param name="paper.type" select="'USletter'"/> + <xsl:param name="double.sided">0</xsl:param> + <xsl:param name="headers.on.blank.pages">0</xsl:param> + <xsl:param name="footers.on.blank.pages">0</xsl:param> + + <!-- Space between paper border and content (chaotic stuff, don't touch) --> + <xsl:param name="page.margin.top">5mm</xsl:param> + <xsl:param name="region.before.extent">10mm</xsl:param> + <xsl:param name="body.margin.top">10mm</xsl:param> + + <xsl:param name="body.margin.bottom">15mm</xsl:param> + <xsl:param name="region.after.extent">10mm</xsl:param> + <xsl:param name="page.margin.bottom">0mm</xsl:param> + + <xsl:param name="page.margin.outer">18mm</xsl:param> + <xsl:param name="page.margin.inner">18mm</xsl:param> + + <!-- No intendation of Titles --> + <xsl:param name="title.margin.left">0pc</xsl:param> + + <!--################################################### + Fonts & Styles + ################################################### --> + + <!-- Left aligned text and no hyphenation --> + <xsl:param name="alignment">justify</xsl:param> + <xsl:param name="hyphenate">true</xsl:param> + + <!-- Default Font size --> + <xsl:param name="body.font.master">11</xsl:param> + <xsl:param name="body.font.small">8</xsl:param> + + <!-- Line height in body text --> + <xsl:param name="line-height">1.4</xsl:param> + + <!-- Force line break in long URLs --> + <xsl:param name="ulink.hyphenate.chars">/&?</xsl:param> + <xsl:param name="ulink.hyphenate">​</xsl:param> + + <!-- Monospaced fonts are smaller than regular text --> + <xsl:attribute-set name="monospace.properties"> + <xsl:attribute name="font-family"> + <xsl:value-of select="$monospace.font.family"/> + </xsl:attribute> + <xsl:attribute name="font-size">0.8em</xsl:attribute> + <xsl:attribute name="wrap-option">wrap</xsl:attribute> + <xsl:attribute name="hyphenate">true</xsl:attribute> + </xsl:attribute-set> + + + <!-- add page break after abstract block --> + <xsl:attribute-set name="abstract.properties"> + <xsl:attribute name="break-after">page</xsl:attribute> + </xsl:attribute-set> + + <!-- add page break after toc --> + <xsl:attribute-set name="toc.margin.properties"> + <xsl:attribute name="break-after">page</xsl:attribute> + </xsl:attribute-set> + + <!-- add page break after first level sections --> + <xsl:attribute-set name="section.level1.properties"> + <xsl:attribute name="break-after">page</xsl:attribute> + </xsl:attribute-set> + + <!-- Show only Sections up to level 3 in the TOCs --> + <xsl:param name="toc.section.depth">2</xsl:param> + + <!-- Dot and Whitespace as separator in TOC between Label and Title--> + <xsl:param name="autotoc.label.separator" select="'. '"/> + + <!-- program listings / examples formatting --> + <xsl:attribute-set name="monospace.verbatim.properties"> + <xsl:attribute name="font-family">Courier</xsl:attribute> + <xsl:attribute name="font-size">8pt</xsl:attribute> + <xsl:attribute name="keep-together.within-column">always</xsl:attribute> + </xsl:attribute-set> + + <xsl:param name="shade.verbatim" select="1" /> + + <xsl:attribute-set name="shade.verbatim.style"> + <xsl:attribute name="background-color">#E8E8E8</xsl:attribute> + <xsl:attribute name="border-width">0.5pt</xsl:attribute> + <xsl:attribute name="border-style">solid</xsl:attribute> + <xsl:attribute name="border-color">#575757</xsl:attribute> + <xsl:attribute name="padding">3pt</xsl:attribute> + </xsl:attribute-set> + + <!-- callouts customization --> + <xsl:param name="callout.unicode" select="1" /> + <xsl:param name="callout.graphics" select="0" /> + <xsl:param name="callout.defaultcolumn">90</xsl:param> + + <!-- Syntax Highlighting --> + + +</xsl:stylesheet> http://git-wip-us.apache.org/repos/asf/hbase/blob/a1fe1e09/src/main/docbkx/datamodel.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/datamodel.xml b/src/main/docbkx/datamodel.xml new file mode 100644 index 0000000..bdf697d --- /dev/null +++ b/src/main/docbkx/datamodel.xml @@ -0,0 +1,865 @@ +<?xml version="1.0" encoding="UTF-8"?> +<chapter + xml:id="datamodel" + version="5.0" + xmlns="http://docbook.org/ns/docbook" + xmlns:xlink="http://www.w3.org/1999/xlink" + xmlns:xi="http://www.w3.org/2001/XInclude" + xmlns:svg="http://www.w3.org/2000/svg" + xmlns:m="http://www.w3.org/1998/Math/MathML" + xmlns:html="http://www.w3.org/1999/xhtml" + xmlns:db="http://docbook.org/ns/docbook"> + <!--/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +--> + + <title>Data Model</title> + <para>In HBase, data is stored in tables, which have rows and columns. This is a terminology + overlap with relational databases (RDBMSs), but this is not a helpful analogy. Instead, it can + be helpful to think of an HBase table as a multi-dimensional map.</para> + <variablelist> + <title>HBase Data Model Terminology</title> + <varlistentry> + <term>Table</term> + <listitem> + <para>An HBase table consists of multiple rows.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Row</term> + <listitem> + <para>A row in HBase consists of a row key and one or more columns with values associated + with them. Rows are sorted alphabetically by the row key as they are stored. For this + reason, the design of the row key is very important. The goal is to store data in such a + way that related rows are near each other. A common row key pattern is a website domain. + If your row keys are domains, you should probably store them in reverse (org.apache.www, + org.apache.mail, org.apache.jira). This way, all of the Apache domains are near each + other in the table, rather than being spread out based on the first letter of the + subdomain.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Column</term> + <listitem> + <para>A column in HBase consists of a column family and a column qualifier, which are + delimited by a <literal>:</literal> (colon) character.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Column Family</term> + <listitem> + <para>Column families physically colocate a set of columns and their values, often for + performance reasons. Each column family has a set of storage properties, such as whether + its values should be cached in memory, how its data is compressed or its row keys are + encoded, and others. Each row in a table has the same column + families, though a given row might not store anything in a given column family.</para> + <para>Column families are specified when you create your table, and influence the way your + data is stored in the underlying filesystem. Therefore, the column families should be + considered carefully during schema design.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Column Qualifier</term> + <listitem> + <para>A column qualifier is added to a column family to provide the index for a given + piece of data. Given a column family <literal>content</literal>, a column qualifier + might be <literal>content:html</literal>, and another might be + <literal>content:pdf</literal>. Though column families are fixed at table creation, + column qualifiers are mutable and may differ greatly between rows.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Cell</term> + <listitem> + <para>A cell is a combination of row, column family, and column qualifier, and contains a + value and a timestamp, which represents the value's version.</para> + <para>A cell's value is an uninterpreted array of bytes.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Timestamp</term> + <listitem> + <para>A timestamp is written alongside each value, and is the identifier for a given + version of a value. By default, the timestamp represents the time on the RegionServer + when the data was written, but you can specify a different timestamp value when you put + data into the cell.</para> + <caution> + <para>Direct manipulation of timestamps is an advanced feature which is only exposed for + special cases that are deeply integrated with HBase, and is discouraged in general. + Encoding a timestamp at the application level is the preferred pattern.</para> + </caution> + <para>You can specify the maximum number of versions of a value that HBase retains, per column + family. When the maximum number of versions is reached, the oldest versions are + eventually deleted. By default, only the newest version is kept.</para> + </listitem> + </varlistentry> + </variablelist> + + <section + xml:id="conceptual.view"> + <title>Conceptual View</title> + <para>You can read a very understandable explanation of the HBase data model in the blog post <link + xlink:href="http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable">Understanding + HBase and BigTable</link> by Jim R. Wilson. Another good explanation is available in the + PDF <link + xlink:href="http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf">Introduction + to Basic Schema Design</link> by Amandeep Khurana. It may help to read different + perspectives to get a solid understanding of HBase schema design. The linked articles cover + the same ground as the information in this section.</para> + <para> The following example is a slightly modified form of the one on page 2 of the <link + xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> paper. There + is a table called <varname>webtable</varname> that contains two rows + (<literal>com.cnn.www</literal> + and <literal>com.example.www</literal>), three column families named + <varname>contents</varname>, <varname>anchor</varname>, and <varname>people</varname>. In + this example, for the first row (<literal>com.cnn.www</literal>), + <varname>anchor</varname> contains two columns (<varname>anchor:cssnsi.com</varname>, + <varname>anchor:my.look.ca</varname>) and <varname>contents</varname> contains one column + (<varname>contents:html</varname>). This example contains 5 versions of the row with the + row key <literal>com.cnn.www</literal>, and one version of the row with the row key + <literal>com.example.www</literal>. The <varname>contents:html</varname> column qualifier contains the entire + HTML of a given website. Qualifiers of the <varname>anchor</varname> column family each + contain the external site which links to the site represented by the row, along with the + text it used in the anchor of its link. The <varname>people</varname> column family represents + people associated with the site. + </para> + <note> + <title>Column Names</title> + <para> By convention, a column name is made of its column family prefix and a + <emphasis>qualifier</emphasis>. For example, the column + <emphasis>contents:html</emphasis> is made up of the column family + <varname>contents</varname> and the <varname>html</varname> qualifier. The colon + character (<literal>:</literal>) delimits the column family from the column family + <emphasis>qualifier</emphasis>. </para> + </note> + <table + frame="all"> + <title>Table <varname>webtable</varname></title> + <tgroup + cols="5" + align="left" + colsep="1" + rowsep="1"> + <colspec + colname="c1" /> + <colspec + colname="c2" /> + <colspec + colname="c3" /> + <colspec + colname="c4" /> + <colspec + colname="c5" /> + <thead> + <row> + <entry>Row Key</entry> + <entry>Time Stamp</entry> + <entry>ColumnFamily <varname>contents</varname></entry> + <entry>ColumnFamily <varname>anchor</varname></entry> + <entry>ColumnFamily <varname>people</varname></entry> + </row> + </thead> + <tbody> + <row> + <entry>"com.cnn.www"</entry> + <entry>t9</entry> + <entry /> + <entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry> + <entry /> + </row> + <row> + <entry>"com.cnn.www"</entry> + <entry>t8</entry> + <entry /> + <entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry> + <entry /> + </row> + <row> + <entry>"com.cnn.www"</entry> + <entry>t6</entry> + <entry><varname>contents:html</varname> = "<html>..."</entry> + <entry /> + <entry /> + </row> + <row> + <entry>"com.cnn.www"</entry> + <entry>t5</entry> + <entry><varname>contents:html</varname> = "<html>..."</entry> + <entry /> + <entry /> + </row> + <row> + <entry>"com.cnn.www"</entry> + <entry>t3</entry> + <entry><varname>contents:html</varname> = "<html>..."</entry> + <entry /> + <entry /> + </row> + <row> + <entry>"com.example.www"</entry> + <entry>t5</entry> + <entry><varname>contents:html</varname> = "<html>..."</entry> + <entry></entry> + <entry>people:author = "John Doe"</entry> + </row> + </tbody> + </tgroup> + </table> + <para>Cells in this table that appear to be empty do not take space, or in fact exist, in + HBase. This is what makes HBase "sparse." A tabular view is not the only possible way to + look at data in HBase, or even the most accurate. The following represents the same + information as a multi-dimensional map. This is only a mock-up for illustrative + purposes and may not be strictly accurate.</para> + <programlisting><![CDATA[ +{ + "com.cnn.www": { + contents: { + t6: contents:html: "<html>..." + t5: contents:html: "<html>..." + t3: contents:html: "<html>..." + } + anchor: { + t9: anchor:cnnsi.com = "CNN" + t8: anchor:my.look.ca = "CNN.com" + } + people: {} + } + "com.example.www": { + contents: { + t5: contents:html: "<html>..." + } + anchor: {} + people: { + t5: people:author: "John Doe" + } + } +} + ]]></programlisting> + + </section> + <section + xml:id="physical.view"> + <title>Physical View</title> + <para> Although at a conceptual level tables may be viewed as a sparse set of rows, they are + physically stored by column family. A new column qualifier (column_family:column_qualifier) + can be added to an existing column family at any time.</para> + <table + frame="all"> + <title>ColumnFamily <varname>anchor</varname></title> + <tgroup + cols="3" + align="left" + colsep="1" + rowsep="1"> + <colspec + colname="c1" /> + <colspec + colname="c2" /> + <colspec + colname="c3" /> + <thead> + <row> + <entry>Row Key</entry> + <entry>Time Stamp</entry> + <entry>Column Family <varname>anchor</varname></entry> + </row> + </thead> + <tbody> + <row> + <entry>"com.cnn.www"</entry> + <entry>t9</entry> + <entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry> + </row> + <row> + <entry>"com.cnn.www"</entry> + <entry>t8</entry> + <entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry> + </row> + </tbody> + </tgroup> + </table> + <table + frame="all"> + <title>ColumnFamily <varname>contents</varname></title> + <tgroup + cols="3" + align="left" + colsep="1" + rowsep="1"> + <colspec + colname="c1" /> + <colspec + colname="c2" /> + <colspec + colname="c3" /> + <thead> + <row> + <entry>Row Key</entry> + <entry>Time Stamp</entry> + <entry>ColumnFamily "contents:"</entry> + </row> + </thead> + <tbody> + <row> + <entry>"com.cnn.www"</entry> + <entry>t6</entry> + <entry><varname>contents:html</varname> = "<html>..."</entry> + </row> + <row> + <entry>"com.cnn.www"</entry> + <entry>t5</entry> + <entry><varname>contents:html</varname> = "<html>..."</entry> + </row> + <row> + <entry>"com.cnn.www"</entry> + <entry>t3</entry> + <entry><varname>contents:html</varname> = "<html>..."</entry> + </row> + </tbody> + </tgroup> + </table> + <para>The empty cells shown in the + conceptual view are not stored at all. + Thus a request for the value of the <varname>contents:html</varname> column at time stamp + <literal>t8</literal> would return no value. Similarly, a request for an + <varname>anchor:my.look.ca</varname> value at time stamp <literal>t9</literal> would + return no value. However, if no timestamp is supplied, the most recent value for a + particular column would be returned. Given multiple versions, the most recent is also the + first one found, since timestamps + are stored in descending order. Thus a request for the values of all columns in the row + <varname>com.cnn.www</varname> if no timestamp is specified would be: the value of + <varname>contents:html</varname> from timestamp <literal>t6</literal>, the value of + <varname>anchor:cnnsi.com</varname> from timestamp <literal>t9</literal>, the value of + <varname>anchor:my.look.ca</varname> from timestamp <literal>t8</literal>. </para> + <para>For more information about the internals of how Apache HBase stores data, see <xref + linkend="regions.arch" />. </para> + </section> + + <section + xml:id="namespace"> + <title>Namespace</title> + <para> A namespace is a logical grouping of tables analogous to a database in relation + database systems. This abstraction lays the groundwork for upcoming multi-tenancy related + features: <itemizedlist> + <listitem> + <para>Quota Management (HBASE-8410) - Restrict the amount of resources (ie regions, + tables) a namespace can consume.</para> + </listitem> + <listitem> + <para>Namespace Security Administration (HBASE-9206) - provide another level of security + administration for tenants.</para> + </listitem> + <listitem> + <para>Region server groups (HBASE-6721) - A namespace/table can be pinned onto a subset + of regionservers thus guaranteeing a course level of isolation.</para> + </listitem> + </itemizedlist> + </para> + <section + xml:id="namespace_creation"> + <title>Namespace management</title> + <para> A namespace can be created, removed or altered. Namespace membership is determined + during table creation by specifying a fully-qualified table name of the form:</para> + + <programlisting language="xml"><![CDATA[<table namespace>:<table qualifier>]]></programlisting> + + + <example> + <title>Examples</title> + + <programlisting language="bourne"> +#Create a namespace +create_namespace 'my_ns' + </programlisting> + <programlisting language="bourne"> +#create my_table in my_ns namespace +create 'my_ns:my_table', 'fam' + </programlisting> + <programlisting language="bourne"> +#drop namespace +drop_namespace 'my_ns' + </programlisting> + <programlisting language="bourne"> +#alter namespace +alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'} + </programlisting> + </example> + </section> + <section + xml:id="namespace_special"> + <title>Predefined namespaces</title> + <para> There are two predefined special namespaces: </para> + <itemizedlist> + <listitem> + <para>hbase - system namespace, used to contain hbase internal tables</para> + </listitem> + <listitem> + <para>default - tables with no explicit specified namespace will automatically fall into + this namespace.</para> + </listitem> + </itemizedlist> + <example> + <title>Examples</title> + + <programlisting language="bourne"> +#namespace=foo and table qualifier=bar +create 'foo:bar', 'fam' + +#namespace=default and table qualifier=bar +create 'bar', 'fam' +</programlisting> + </example> + </section> + </section> + + <section + xml:id="table"> + <title>Table</title> + <para> Tables are declared up front at schema definition time. </para> + </section> + + <section + xml:id="row"> + <title>Row</title> + <para>Row keys are uninterrpreted bytes. Rows are lexicographically sorted with the lowest + order appearing first in a table. The empty byte array is used to denote both the start and + end of a tables' namespace.</para> + </section> + + <section + xml:id="columnfamily"> + <title>Column Family<indexterm><primary>Column Family</primary></indexterm></title> + <para> Columns in Apache HBase are grouped into <emphasis>column families</emphasis>. All + column members of a column family have the same prefix. For example, the columns + <emphasis>courses:history</emphasis> and <emphasis>courses:math</emphasis> are both + members of the <emphasis>courses</emphasis> column family. The colon character + (<literal>:</literal>) delimits the column family from the <indexterm><primary>column + family qualifier</primary><secondary>Column Family Qualifier</secondary></indexterm>. + The column family prefix must be composed of <emphasis>printable</emphasis> characters. The + qualifying tail, the column family <emphasis>qualifier</emphasis>, can be made of any + arbitrary bytes. Column families must be declared up front at schema definition time whereas + columns do not need to be defined at schema time but can be conjured on the fly while the + table is up an running.</para> + <para>Physically, all column family members are stored together on the filesystem. Because + tunings and storage specifications are done at the column family level, it is advised that + all column family members have the same general access pattern and size + characteristics.</para> + + </section> + <section + xml:id="cells"> + <title>Cells<indexterm><primary>Cells</primary></indexterm></title> + <para>A <emphasis>{row, column, version} </emphasis>tuple exactly specifies a + <literal>cell</literal> in HBase. Cell content is uninterrpreted bytes</para> + </section> + <section + xml:id="data_model_operations"> + <title>Data Model Operations</title> + <para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are + applied via <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html">Table</link> + instances. + </para> + <section + xml:id="get"> + <title>Get</title> + <para><link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> + returns attributes for a specified row. Gets are executed via <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)"> + Table.get</link>. </para> + </section> + <section + xml:id="put"> + <title>Put</title> + <para><link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> + either adds new rows to a table (if the key is new) or can update existing rows (if the + key already exists). Puts are executed via <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)"> + Table.put</link> (writeBuffer) or <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List, java.lang.Object[])"> + Table.batch</link> (non-writeBuffer). </para> + </section> + <section + xml:id="scan"> + <title>Scans</title> + <para><link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> + allow iteration over multiple rows for specified attributes. </para> + <para>The following is an example of a Scan on a Table instance. Assume that a table is + populated with rows with keys "row1", "row2", "row3", and then another set of rows with + the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan + instance to return the rows beginning with "row".</para> +<programlisting language="java"> +public static final byte[] CF = "cf".getBytes(); +public static final byte[] ATTR = "attr".getBytes(); +... + +Table table = ... // instantiate a Table instance + +Scan scan = new Scan(); +scan.addColumn(CF, ATTR); +scan.setRowPrefixFilter(Bytes.toBytes("row")); +ResultScanner rs = table.getScanner(scan); +try { + for (Result r = rs.next(); r != null; r = rs.next()) { + // process result... +} finally { + rs.close(); // always close the ResultScanner! +</programlisting> + <para>Note that generally the easiest way to specify a specific stop point for a scan is by + using the <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html">InclusiveStopFilter</link> + class. </para> + </section> + <section + xml:id="delete"> + <title>Delete</title> + <para><link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html">Delete</link> + removes a row from a table. Deletes are executed via <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)"> + HTable.delete</link>. </para> + <para>HBase does not modify data in place, and so deletes are handled by creating new + markers called <emphasis>tombstones</emphasis>. These tombstones, along with the dead + values, are cleaned up on major compactions. </para> + <para>See <xref + linkend="version.delete" /> for more information on deleting versions of columns, and + see <xref + linkend="compaction" /> for more information on compactions. </para> + + </section> + + </section> + + + <section + xml:id="versions"> + <title>Versions<indexterm><primary>Versions</primary></indexterm></title> + + <para>A <emphasis>{row, column, version} </emphasis>tuple exactly specifies a + <literal>cell</literal> in HBase. It's possible to have an unbounded number of cells where + the row and column are the same but the cell address differs only in its version + dimension.</para> + + <para>While rows and column keys are expressed as bytes, the version is specified using a long + integer. Typically this long contains time instances such as those returned by + <code>java.util.Date.getTime()</code> or <code>System.currentTimeMillis()</code>, that is: + <quote>the difference, measured in milliseconds, between the current time and midnight, + January 1, 1970 UTC</quote>.</para> + + <para>The HBase version dimension is stored in decreasing order, so that when reading from a + store file, the most recent values are found first.</para> + + <para>There is a lot of confusion over the semantics of <literal>cell</literal> versions, in + HBase. In particular:</para> + <itemizedlist> + <listitem> + <para>If multiple writes to a cell have the same version, only the last written is + fetchable.</para> + </listitem> + + <listitem> + <para>It is OK to write cells in a non-increasing version order.</para> + </listitem> + </itemizedlist> + + <para>Below we describe how the version dimension in HBase currently works. See <link + xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link> for + discussion of HBase versions. <link + xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in HBase</link> + makes for a good read on the version, or time, dimension in HBase. It has more detail on + versioning than is provided here. As of this writing, the limiitation + <emphasis>Overwriting values at existing timestamps</emphasis> mentioned in the + article no longer holds in HBase. This section is basically a synopsis of this article + by Bruno Dumon.</para> + + <section xml:id="specify.number.of.versions"> + <title>Specifying the Number of Versions to Store</title> + <para>The maximum number of versions to store for a given column is part of the column + schema and is specified at table creation, or via an <command>alter</command> command, via + <code>HColumnDescriptor.DEFAULT_VERSIONS</code>. Prior to HBase 0.96, the default number + of versions kept was <literal>3</literal>, but in 0.96 and newer has been changed to + <literal>1</literal>.</para> + <example> + <title>Modify the Maximum Number of Versions for a Column</title> + <para>This example uses HBase Shell to keep a maximum of 5 versions of column + <code>f1</code>. You could also use <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html" + >HColumnDescriptor</link>.</para> + <screen><![CDATA[hbase> alter ât1â², NAME => âf1â², VERSIONS => 5]]></screen> + </example> + <example> + <title>Modify the Minimum Number of Versions for a Column</title> + <para>You can also specify the minimum number of versions to store. By default, this is + set to 0, which means the feature is disabled. The following example sets the minimum + number of versions on field <code>f1</code> to <literal>2</literal>, via HBase Shell. + You could also use <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html" + >HColumnDescriptor</link>.</para> + <screen><![CDATA[hbase> alter ât1â², NAME => âf1â², MIN_VERSIONS => 2]]></screen> + </example> + <para>Starting with HBase 0.98.2, you can specify a global default for the maximum number of + versions kept for all newly-created columns, by setting + <option>hbase.column.max.version</option> in <filename>hbase-site.xml</filename>. See + <xref linkend="hbase.column.max.version"/>.</para> + </section> + + <section + xml:id="versions.ops"> + <title>Versions and HBase Operations</title> + + <para>In this section we look at the behavior of the version dimension for each of the core + HBase operations.</para> + + <section> + <title>Get/Scan</title> + + <para>Gets are implemented on top of Scans. The below discussion of <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> + applies equally to <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scans</link>.</para> + + <para>By default, i.e. if you specify no explicit version, when doing a + <literal>get</literal>, the cell whose version has the largest value is returned + (which may or may not be the latest one written, see later). The default behavior can be + modified in the following ways:</para> + + <itemizedlist> + <listitem> + <para>to return more than one version, see <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para> + </listitem> + + <listitem> + <para>to return versions other than the latest, see <link + xlink:href="???">Get.setTimeRange()</link></para> + + <para>To retrieve the latest version that is less than or equal to a given value, thus + giving the 'latest' state of the record at a certain point in time, just use a range + from 0 to the desired version and set the max versions to 1.</para> + </listitem> + </itemizedlist> + + </section> + <section + xml:id="default_get_example"> + <title>Default Get Example</title> + <para>The following Get will only retrieve the current version of the row</para> + <programlisting language="java"> +public static final byte[] CF = "cf".getBytes(); +public static final byte[] ATTR = "attr".getBytes(); +... +Get get = new Get(Bytes.toBytes("row1")); +Result r = table.get(get); +byte[] b = r.getValue(CF, ATTR); // returns current version of value +</programlisting> + </section> + <section + xml:id="versioned_get_example"> + <title>Versioned Get Example</title> + <para>The following Get will return the last 3 versions of the row.</para> + <programlisting language="java"> +public static final byte[] CF = "cf".getBytes(); +public static final byte[] ATTR = "attr".getBytes(); +... +Get get = new Get(Bytes.toBytes("row1")); +get.setMaxVersions(3); // will return last 3 versions of row +Result r = table.get(get); +byte[] b = r.getValue(CF, ATTR); // returns current version of value +List<KeyValue> kv = r.getColumn(CF, ATTR); // returns all versions of this column +</programlisting> + </section> + + <section> + <title>Put</title> + + <para>Doing a put always creates a new version of a <literal>cell</literal>, at a certain + timestamp. By default the system uses the server's <literal>currentTimeMillis</literal>, + but you can specify the version (= the long integer) yourself, on a per-column level. + This means you could assign a time in the past or the future, or use the long value for + non-time purposes.</para> + + <para>To overwrite an existing value, do a put at exactly the same row, column, and + version as that of the cell you would overshadow.</para> + <section + xml:id="implicit_version_example"> + <title>Implicit Version Example</title> + <para>The following Put will be implicitly versioned by HBase with the current + time.</para> + <programlisting language="java"> +public static final byte[] CF = "cf".getBytes(); +public static final byte[] ATTR = "attr".getBytes(); +... +Put put = new Put(Bytes.toBytes(row)); +put.add(CF, ATTR, Bytes.toBytes( data)); +table.put(put); +</programlisting> + </section> + <section + xml:id="explicit_version_example"> + <title>Explicit Version Example</title> + <para>The following Put has the version timestamp explicitly set.</para> + <programlisting language="java"> +public static final byte[] CF = "cf".getBytes(); +public static final byte[] ATTR = "attr".getBytes(); +... +Put put = new Put( Bytes.toBytes(row)); +long explicitTimeInMs = 555; // just an example +put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data)); +table.put(put); +</programlisting> + <para>Caution: the version timestamp is internally by HBase for things like time-to-live + calculations. It's usually best to avoid setting this timestamp yourself. Prefer using + a separate timestamp attribute of the row, or have the timestamp a part of the rowkey, + or both. </para> + </section> + + </section> + + <section + xml:id="version.delete"> + <title>Delete</title> + + <para>There are three different types of internal delete markers. See Lars Hofhansl's blog + for discussion of his attempt adding another, <link + xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html">Scanning + in HBase: Prefix Delete Marker</link>. </para> + <itemizedlist> + <listitem> + <para>Delete: for a specific version of a column.</para> + </listitem> + <listitem> + <para>Delete column: for all versions of a column.</para> + </listitem> + <listitem> + <para>Delete family: for all columns of a particular ColumnFamily</para> + </listitem> + </itemizedlist> + <para>When deleting an entire row, HBase will internally create a tombstone for each + ColumnFamily (i.e., not each individual column). </para> + <para>Deletes work by creating <emphasis>tombstone</emphasis> markers. For example, let's + suppose we want to delete a row. For this you can specify a version, or else by default + the <literal>currentTimeMillis</literal> is used. What this means is <quote>delete all + cells where the version is less than or equal to this version</quote>. HBase never + modifies data in place, so for example a delete will not immediately delete (or mark as + deleted) the entries in the storage file that correspond to the delete condition. + Rather, a so-called <emphasis>tombstone</emphasis> is written, which will mask the + deleted values. When HBase does a major compaction, the tombstones are processed to + actually remove the dead values, together with the tombstones themselves. If the version + you specified when deleting a row is larger than the version of any value in the row, + then you can consider the complete row to be deleted.</para> + <para>For an informative discussion on how deletes and versioning interact, see the thread <link + xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421">Put w/ + timestamp -> Deleteall -> Put w/ timestamp fails</link> up on the user mailing + list.</para> + <para>Also see <xref + linkend="keyvalue" /> for more information on the internal KeyValue format. </para> + <para>Delete markers are purged during the next major compaction of the store, unless the + <option>KEEP_DELETED_CELLS</option> option is set in the column family. To keep the + deletes for a configurable amount of time, you can set the delete TTL via the + <option>hbase.hstore.time.to.purge.deletes</option> property in + <filename>hbase-site.xml</filename>. If + <option>hbase.hstore.time.to.purge.deletes</option> is not set, or set to 0, all + delete markers, including those with timestamps in the future, are purged during the + next major compaction. Otherwise, a delete marker with a timestamp in the future is kept + until the major compaction which occurs after the time represented by the marker's + timestamp plus the value of <option>hbase.hstore.time.to.purge.deletes</option>, in + milliseconds. </para> + <note> + <para>This behavior represents a fix for an unexpected change that was introduced in + HBase 0.94, and was fixed in <link + xlink:href="https://issues.apache.org/jira/browse/HBASE-10118">HBASE-10118</link>. + The change has been backported to HBase 0.94 and newer branches.</para> + </note> + </section> + </section> + + <section> + <title>Current Limitations</title> + + <section> + <title>Deletes mask Puts</title> + + <para>Deletes mask puts, even puts that happened after the delete + was entered. See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-2256" + >HBASE-2256</link>. Remember that a delete writes a tombstone, which only + disappears after then next major compaction has run. Suppose you do + a delete of everything <= T. After this you do a new put with a + timestamp <= T. This put, even if it happened after the delete, + will be masked by the delete tombstone. Performing the put will not + fail, but when you do a get you will notice the put did have no + effect. It will start working again after the major compaction has + run. These issues should not be a problem if you use + always-increasing versions for new puts to a row. But they can occur + even if you do not care about time: just do delete and put + immediately after each other, and there is some chance they happen + within the same millisecond.</para> + </section> + + <section + xml:id="major.compactions.change.query.results"> + <title>Major compactions change query results</title> + + <para><quote>...create three cell versions at t1, t2 and t3, with a maximum-versions + setting of 2. So when getting all versions, only the values at t2 and t3 will be + returned. But if you delete the version at t2 or t3, the one at t1 will appear again. + Obviously, once a major compaction has run, such behavior will not be the case + anymore...</quote> (See <emphasis>Garbage Collection</emphasis> in <link + xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in + HBase</link>.)</para> + </section> + </section> + </section> + <section xml:id="dm.sort"> + <title>Sort Order</title> + <para>All data model operations HBase return data in sorted order. First by row, + then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted + in reverse, so newest records are returned first). + </para> + </section> + <section xml:id="dm.column.metadata"> + <title>Column Metadata</title> + <para>There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily. + Thus, while HBase can support not only a wide number of columns per row, but a heterogenous set of columns + between rows as well, it is your responsibility to keep track of the column names. + </para> + <para>The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows. + For more information about how HBase stores data internally, see <xref linkend="keyvalue" />. + </para> + </section> + <section xml:id="joins"><title>Joins</title> + <para>Whether HBase supports joins is a common question on the dist-list, and there is a simple answer: it doesn't, + at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated + in this chapter, the read data model operations in HBase are Get and Scan. + </para> + <para>However, that doesn't mean that equivalent join functionality can't be supported in your application, but + you have to do it yourself. The two primary strategies are either denormalizing the data upon writing to HBase, + or to have lookup tables and do the join between HBase tables in your application or MapReduce code (and as RDBMS' + demonstrate, there are several strategies for this depending on the size of the tables, e.g., nested loops vs. + hash-joins). So which is the best approach? It depends on what you are trying to do, and as such there isn't a single + answer that works for every use case. + </para> + </section> + <section xml:id="acid"><title>ACID</title> + <para>See <link xlink:href="http://hbase.apache.org/acid-semantics.html">ACID Semantics</link>. + Lars Hofhansl has also written a note on + <link xlink:href="http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html">ACID in HBase</link>.</para> + </section> + </chapter>
