http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/book.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml deleted file mode 100644 index 26085a9..0000000 --- a/src/main/docbkx/book.xml +++ /dev/null @@ -1,3595 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- -/** - * - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ ---> -<book version="5.0" xmlns="http://docbook.org/ns/docbook" - xmlns:xlink="http://www.w3.org/1999/xlink" - xmlns:xi="http://www.w3.org/2001/XInclude" - xmlns:svg="http://www.w3.org/2000/svg" - xmlns:m="http://www.w3.org/1998/Math/MathML" - xmlns:html="http://www.w3.org/1999/xhtml" - xmlns:db="http://docbook.org/ns/docbook" xml:id="book"> - <info> - - <title><link xlink:href="http://www.hbase.org"> - The Apache HBase™ Reference Guide - </link></title> - <subtitle><link xlink:href="http://www.hbase.org"> - <inlinemediaobject> - <imageobject> - <imagedata align="middle" valign="middle" fileref="hbase_logo.png" /> - </imageobject> - </inlinemediaobject> - </link> - </subtitle> - <copyright><year>2013</year><holder>Apache Software Foundation. - All Rights Reserved. Apache Hadoop, Hadoop, MapReduce, HDFS, Zookeeper, HBase, and the HBase project logo are trademarks of the Apache Software Foundation. - </holder> - </copyright> - <abstract> - <para>This is the official reference guide of - <link xlink:href="http://www.hbase.org">Apache HBase™</link>, - a distributed, versioned, column-oriented database built on top of - <link xlink:href="http://hadoop.apache.org/">Apache Hadoop™</link> and - <link xlink:href="http://zookeeper.apache.org/">Apache ZooKeeper™</link>. - </para> - </abstract> - - <revhistory> - <revision> - <revnumber> - <?eval ${project.version}?> - </revnumber> - <date> - <?eval ${buildDate}?> - </date> - </revision> - </revhistory> - </info> - - <!--XInclude some chapters--> - <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="preface.xml" /> - <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="getting_started.xml" /> - <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="configuration.xml" /> - <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="upgrading.xml" /> - <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shell.xml" /> - - <chapter xml:id="datamodel"> - <title>Data Model</title> - <para>In short, applications store data into an HBase table. - Tables are made of rows and columns. - All columns in HBase belong to a particular column family. - Table cells -- the intersection of row and column - coordinates -- are versioned. - A cellâs content is an uninterpreted array of bytes. - </para> - <para>Table row keys are also byte arrays so almost anything can - serve as a row key from strings to binary representations of longs or - even serialized data structures. Rows in HBase tables - are sorted by row key. The sort is byte-ordered. All table accesses are - via the table row key -- its primary key. -</para> - - <section xml:id="conceptual.view"><title>Conceptual View</title> - <para> - The following example is a slightly modified form of the one on page - 2 of the <link xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> paper. - There is a table called <varname>webtable</varname> that contains two column families named - <varname>contents</varname> and <varname>anchor</varname>. - In this example, <varname>anchor</varname> contains two - columns (<varname>anchor:cssnsi.com</varname>, <varname>anchor:my.look.ca</varname>) - and <varname>contents</varname> contains one column (<varname>contents:html</varname>). - <note> - <title>Column Names</title> - <para> - By convention, a column name is made of its column family prefix and a - <emphasis>qualifier</emphasis>. For example, the - column - <emphasis>contents:html</emphasis> is made up of the column family <varname>contents</varname> - and <varname>html</varname> qualifier. - The colon character (<literal - moreinfo="none">:</literal>) delimits the column family from the - column family <emphasis>qualifier</emphasis>. - </para> - </note> - <table frame='all'><title>Table <varname>webtable</varname></title> - <tgroup cols='4' align='left' colsep='1' rowsep='1'> - <colspec colname='c1'/> - <colspec colname='c2'/> - <colspec colname='c3'/> - <colspec colname='c4'/> - <thead> - <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily <varname>contents</varname></entry><entry>ColumnFamily <varname>anchor</varname></entry></row> - </thead> - <tbody> - <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row> - <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row> - <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row> - <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row> - <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row> - </tbody> - </tgroup> - </table> - </para> - </section> - <section xml:id="physical.view"><title>Physical View</title> - <para> - Although at a conceptual level tables may be viewed as a sparse set of rows. - Physically they are stored on a per-column family basis. New columns - (i.e., <varname>columnfamily:column</varname>) can be added to any - column family without pre-announcing them. - <table frame='all'><title>ColumnFamily <varname>anchor</varname></title> - <tgroup cols='3' align='left' colsep='1' rowsep='1'> - <colspec colname='c1'/> - <colspec colname='c2'/> - <colspec colname='c3'/> - <thead> - <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column Family <varname>anchor</varname></entry></row> - </thead> - <tbody> - <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row> - <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row> - </tbody> - </tgroup> - </table> - <table frame='all'><title>ColumnFamily <varname>contents</varname></title> - <tgroup cols='3' align='left' colsep='1' rowsep='1'> - <colspec colname='c1'/> - <colspec colname='c2'/> - <colspec colname='c3'/> - <thead> - <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily "contents:"</entry></row> - </thead> - <tbody> - <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row> - <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row> - <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row> - </tbody> - </tgroup> - </table> - It is important to note in the diagram above that the empty cells shown in the - conceptual view are not stored since they need not be in a column-oriented - storage format. Thus a request for the value of the <varname>contents:html</varname> - column at time stamp <literal>t8</literal> would return no value. Similarly, a - request for an <varname>anchor:my.look.ca</varname> value at time stamp - <literal>t9</literal> would return no value. However, if no timestamp is - supplied, the most recent value for a particular column would be returned - and would also be the first one found since timestamps are stored in - descending order. Thus a request for the values of all columns in the row - <varname>com.cnn.www</varname> if no timestamp is specified would be: - the value of <varname>contents:html</varname> from time stamp - <literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname> - from time stamp <literal>t9</literal>, the value of - <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>. - </para> - <para>For more information about the internals of how Apache HBase stores data, see <xref linkend="regions.arch" />. - </para> - </section> - - <section xml:id="namespace"> - <title>Namespace</title> - <para> - A namespace is a logical grouping of tables analogous to a database in relation database - systems. This abstraction lays the groundwork for upcoming multi-tenancy related features: - <itemizedlist> - <listitem>Quota Management (HBASE-8410) - Restrict the amount of resources (ie - regions, tables) a namespace can consume.</listitem> - <listitem>Namespace Security Administration (HBASE-9206) - provide another - level of security administration for tenants.</listitem> - <listitem>Region server groups (HBASE-6721) - A namespace/table can be - pinned onto a subset of regionservers thus guaranteeing a course level of - isolation.</listitem> - </itemizedlist> - </para> - <section xml:id="namespace_creation"> - <title>Namespace management</title> - <para> - A namespace can be created, removed or altered. Namespace membership is determined during - table creation by specifying a fully-qualified table name of the form: - <para> - <code><table namespace>:<table qualifier></code> - </para> - <para> - Examples: - </para> -<programlisting> -#Create a namespace -create_namespace 'my_ns' - -#create my_table in my_ns namespace -create 'my_ns:my_table', 'fam' - -#delete namespace -delete_namespace 'my_ns' - -#alter namespace -alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'} -</programlisting> - </para> - </section> - <section xml:id="namespace_special"> - <title>Predefined namespaces</title> - <para> - There are two predefined special namespaces: - <itemizedlist> - <listitem>hbase - system namespace, used to contain hbase internal tables</listitem> - <listitem>default - tables with no explicit specified namespace will automatically - fall into this namespace.</listitem> - </itemizedlist> - </para> - <para> - Examples: -<programlisting> -#namespace=foo and table qualifier=bar -create 'foo:bar', 'fam' - -#namespace=default and table qualifier=bar -create 'bar', 'fam' -</programlisting> - </para> - </section> - </section> - - <section xml:id="table"> - <title>Table</title> - <para> - Tables are declared up front at schema definition time. - </para> - </section> - - <section xml:id="row"> - <title>Row</title> - <para>Row keys are uninterrpreted bytes. Rows are - lexicographically sorted with the lowest order appearing first - in a table. The empty byte array is used to denote both the - start and end of a tables' namespace.</para> - </section> - - <section xml:id="columnfamily"> - <title>Column Family<indexterm><primary>Column Family</primary></indexterm></title> - <para> - Columns in Apache HBase are grouped into <emphasis>column families</emphasis>. - All column members of a column family have the same prefix. For example, the - columns <emphasis>courses:history</emphasis> and - <emphasis>courses:math</emphasis> are both members of the - <emphasis>courses</emphasis> column family. - The colon character (<literal - moreinfo="none">:</literal>) delimits the column family from the - <indexterm>column family <emphasis>qualifier</emphasis><primary>Column Family Qualifier</primary></indexterm>. - The column family prefix must be composed of - <emphasis>printable</emphasis> characters. The qualifying tail, the - column family <emphasis>qualifier</emphasis>, can be made of any - arbitrary bytes. Column families must be declared up front - at schema definition time whereas columns do not need to be - defined at schema time but can be conjured on the fly while - the table is up an running.</para> - <para>Physically, all column family members are stored together on the - filesystem. Because tunings and - storage specifications are done at the column family level, it is - advised that all column family members have the same general access - pattern and size characteristics.</para> - - <para></para> - </section> - <section xml:id="cells"> - <title>Cells<indexterm><primary>Cells</primary></indexterm></title> - <para>A <emphasis>{row, column, version} </emphasis>tuple exactly - specifies a <literal>cell</literal> in HBase. - Cell content is uninterrpreted bytes</para> - </section> - <section xml:id="data_model_operations"> - <title>Data Model Operations</title> - <para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are applied via - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> instances. - </para> - <section xml:id="get"> - <title>Get</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> returns - attributes for a specified row. Gets are executed via - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29"> - HTable.get</link>. - </para> - </section> - <section xml:id="put"> - <title>Put</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> either - adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29"> - HTable.put</link> (writeBuffer) or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29"> - HTable.batch</link> (non-writeBuffer). - </para> - </section> - <section xml:id="scan"> - <title>Scans</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> allow - iteration over multiple rows for specified attributes. - </para> - <para>The following is an example of a - on an HTable table instance. Assume that a table is populated with rows with keys "row1", "row2", "row3", - and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how startRow and stopRow - can be applied to a Scan instance to return the rows beginning with "row". -<programlisting> -public static final byte[] CF = "cf".getBytes(); -public static final byte[] ATTR = "attr".getBytes(); -... - -HTable htable = ... // instantiate HTable - -Scan scan = new Scan(); -scan.addColumn(CF, ATTR); -scan.setStartRow(Bytes.toBytes("row")); // start key is inclusive -scan.setStopRow(Bytes.toBytes("rox")); // stop key is exclusive -ResultScanner rs = htable.getScanner(scan); -try { - for (Result r = rs.next(); r != null; r = rs.next()) { - // process result... -} finally { - rs.close(); // always close the ResultScanner! -} -</programlisting> - </para> - <para>Note that generally the easiest way to specify a specific stop point for a scan is by using the <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html">InclusiveStopFilter</link> class. - </para> - </section> - <section xml:id="delete"> - <title>Delete</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html">Delete</link> removes - a row from a table. Deletes are executed via - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29"> - HTable.delete</link>. - </para> - <para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>. - These tombstones, along with the dead values, are cleaned up on major compactions. - </para> - <para>See <xref linkend="version.delete"/> for more information on deleting versions of columns, and see - <xref linkend="compaction"/> for more information on compactions. - </para> - - </section> - - </section> - - - <section xml:id="versions"> - <title>Versions<indexterm><primary>Versions</primary></indexterm></title> - - <para>A <emphasis>{row, column, version} </emphasis>tuple exactly - specifies a <literal>cell</literal> in HBase. It's possible to have an - unbounded number of cells where the row and column are the same but the - cell address differs only in its version dimension.</para> - - <para>While rows and column keys are expressed as bytes, the version is - specified using a long integer. Typically this long contains time - instances such as those returned by - <code>java.util.Date.getTime()</code> or - <code>System.currentTimeMillis()</code>, that is: <quote>the difference, - measured in milliseconds, between the current time and midnight, January - 1, 1970 UTC</quote>.</para> - - <para>The HBase version dimension is stored in decreasing order, so that - when reading from a store file, the most recent values are found - first.</para> - - <para>There is a lot of confusion over the semantics of - <literal>cell</literal> versions, in HBase. In particular, a couple - questions that often come up are:<itemizedlist> - <listitem> - <para>If multiple writes to a cell have the same version, are all - versions maintained or just the last?<footnote> - <para>Currently, only the last written is fetchable.</para> - </footnote></para> - </listitem> - - <listitem> - <para>Is it OK to write cells in a non-increasing version - order?<footnote> - <para>Yes</para> - </footnote></para> - </listitem> - </itemizedlist></para> - - <para>Below we describe how the version dimension in HBase currently - works<footnote> - <para>See <link - xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link> - for discussion of HBase versions. <link - xlink:href="http://outerthought.org/blog/417-ot.html">Bending time - in HBase</link> makes for a good read on the version, or time, - dimension in HBase. It has more detail on versioning than is - provided here. As of this writing, the limiitation - <emphasis>Overwriting values at existing timestamps</emphasis> - mentioned in the article no longer holds in HBase. This section is - basically a synopsis of this article by Bruno Dumon.</para> - </footnote>.</para> - - <section xml:id="versions.ops"> - <title>Versions and HBase Operations</title> - - <para>In this section we look at the behavior of the version dimension - for each of the core HBase operations.</para> - - <section> - <title>Get/Scan</title> - - <para>Gets are implemented on top of Scans. The below discussion of - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> applies equally to <link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scans</link>.</para> - - <para>By default, i.e. if you specify no explicit version, when - doing a <literal>get</literal>, the cell whose version has the - largest value is returned (which may or may not be the latest one - written, see later). The default behavior can be modified in the - following ways:</para> - - <itemizedlist> - <listitem> - <para>to return more than one version, see <link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para> - </listitem> - - <listitem> - <para>to return versions other than the latest, see <link - xlink:href="???">Get.setTimeRange()</link></para> - - <para>To retrieve the latest version that is less than or equal - to a given value, thus giving the 'latest' state of the record - at a certain point in time, just use a range from 0 to the - desired version and set the max versions to 1.</para> - </listitem> - </itemizedlist> - - </section> - <section xml:id="default_get_example"> - <title>Default Get Example</title> - <para>The following Get will only retrieve the current version of the row -<programlisting> -public static final byte[] CF = "cf".getBytes(); -public static final byte[] ATTR = "attr".getBytes(); -... -Get get = new Get(Bytes.toBytes("row1")); -Result r = htable.get(get); -byte[] b = r.getValue(CF, ATTR); // returns current version of value -</programlisting> - </para> - </section> - <section xml:id="versioned_get_example"> - <title>Versioned Get Example</title> - <para>The following Get will return the last 3 versions of the row. -<programlisting> -public static final byte[] CF = "cf".getBytes(); -public static final byte[] ATTR = "attr".getBytes(); -... -Get get = new Get(Bytes.toBytes("row1")); -get.setMaxVersions(3); // will return last 3 versions of row -Result r = htable.get(get); -byte[] b = r.getValue(CF, ATTR); // returns current version of value -List<KeyValue> kv = r.getColumn(CF, ATTR); // returns all versions of this column -</programlisting> - </para> - </section> - - <section> - <title>Put</title> - - <para>Doing a put always creates a new version of a - <literal>cell</literal>, at a certain timestamp. By default the - system uses the server's <literal>currentTimeMillis</literal>, but - you can specify the version (= the long integer) yourself, on a - per-column level. This means you could assign a time in the past or - the future, or use the long value for non-time purposes.</para> - - <para>To overwrite an existing value, do a put at exactly the same - row, column, and version as that of the cell you would - overshadow.</para> - <section xml:id="implicit_version_example"> - <title>Implicit Version Example</title> - <para>The following Put will be implicitly versioned by HBase with the current time. -<programlisting> -public static final byte[] CF = "cf".getBytes(); -public static final byte[] ATTR = "attr".getBytes(); -... -Put put = new Put(Bytes.toBytes(row)); -put.add(CF, ATTR, Bytes.toBytes( data)); -htable.put(put); -</programlisting> - </para> - </section> - <section xml:id="explicit_version_example"> - <title>Explicit Version Example</title> - <para>The following Put has the version timestamp explicitly set. -<programlisting> -public static final byte[] CF = "cf".getBytes(); -public static final byte[] ATTR = "attr".getBytes(); -... -Put put = new Put( Bytes.toBytes(row)); -long explicitTimeInMs = 555; // just an example -put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data)); -htable.put(put); -</programlisting> - Caution: the version timestamp is internally by HBase for things like time-to-live calculations. - It's usually best to avoid setting this timestamp yourself. Prefer using a separate - timestamp attribute of the row, or have the timestamp a part of the rowkey, or both. - </para> - </section> - - </section> - - <section xml:id="version.delete"> - <title>Delete</title> - - <para>There are three different types of internal delete markers - <footnote><para>See Lars Hofhansl's blog for discussion of his attempt - adding another, <link xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html">Scanning in HBase: Prefix Delete Marker</link></para></footnote>: - <itemizedlist> - <listitem><para>Delete: for a specific version of a column.</para> - </listitem> - <listitem><para>Delete column: for all versions of a column.</para> - </listitem> - <listitem><para>Delete family: for all columns of a particular ColumnFamily</para> - </listitem> - </itemizedlist> - When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column). - </para> - <para>Deletes work by creating <emphasis>tombstone</emphasis> - markers. For example, let's suppose we want to delete a row. For - this you can specify a version, or else by default the - <literal>currentTimeMillis</literal> is used. What this means is - <quote>delete all cells where the version is less than or equal to - this version</quote>. HBase never modifies data in place, so for - example a delete will not immediately delete (or mark as deleted) - the entries in the storage file that correspond to the delete - condition. Rather, a so-called <emphasis>tombstone</emphasis> is - written, which will mask the deleted values<footnote> - <para>When HBase does a major compaction, the tombstones are - processed to actually remove the dead values, together with the - tombstones themselves.</para> - </footnote>. If the version you specified when deleting a row is - larger than the version of any value in the row, then you can - consider the complete row to be deleted.</para> - <para>For an informative discussion on how deletes and versioning interact, see - the thread <link xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421">Put w/ timestamp -> Deleteall -> Put w/ timestamp fails</link> - up on the user mailing list.</para> - <para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format. - </para> - <para>Delete markers are purged during the major compaction of store, - unless the KEEP_DELETED_CELLS is set in the column family. In some - scenarios, users want to keep the deletes for a time and you can set the - delete TTL: hbase.hstore.time.to.purge.deletes in the configuration. - If this delete TTL is not set, or set to 0, all delete markers including those - with future timestamp are purged during the later major compaction. - Otherwise, a delete marker is kept until the major compaction after - marker's timestamp + delete TTL. - </para> - </section> - </section> - - <section> - <title>Current Limitations</title> - - <section> - <title>Deletes mask Puts</title> - - <para>Deletes mask puts, even puts that happened after the delete - was entered<footnote> - <para><link - xlink:href="https://issues.apache.org/jira/browse/HBASE-2256">HBASE-2256</link></para> - </footnote>. Remember that a delete writes a tombstone, which only - disappears after then next major compaction has run. Suppose you do - a delete of everything <= T. After this you do a new put with a - timestamp <= T. This put, even if it happened after the delete, - will be masked by the delete tombstone. Performing the put will not - fail, but when you do a get you will notice the put did have no - effect. It will start working again after the major compaction has - run. These issues should not be a problem if you use - always-increasing versions for new puts to a row. But they can occur - even if you do not care about time: just do delete and put - immediately after each other, and there is some chance they happen - within the same millisecond.</para> - </section> - - <section> - <title>Major compactions change query results</title> - - <para><quote>...create three cell versions at t1, t2 and t3, with a - maximum-versions setting of 2. So when getting all versions, only - the values at t2 and t3 will be returned. But if you delete the - version at t2 or t3, the one at t1 will appear again. Obviously, - once a major compaction has run, such behavior will not be the case - anymore...<footnote> - <para>See <emphasis>Garbage Collection</emphasis> in <link - xlink:href="http://outerthought.org/blog/417-ot.html">Bending - time in HBase</link> </para> - </footnote></quote></para> - </section> - </section> - </section> - <section xml:id="dm.sort"> - <title>Sort Order</title> - <para>All data model operations HBase return data in sorted order. First by row, - then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted - in reverse, so newest records are returned first). - </para> - </section> - <section xml:id="dm.column.metadata"> - <title>Column Metadata</title> - <para>There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily. - Thus, while HBase can support not only a wide number of columns per row, but a heterogenous set of columns - between rows as well, it is your responsibility to keep track of the column names. - </para> - <para>The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows. - For more information about how HBase stores data internally, see <xref linkend="keyvalue" />. - </para> - </section> - <section xml:id="joins"><title>Joins</title> - <para>Whether HBase supports joins is a common question on the dist-list, and there is a simple answer: it doesn't, - at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated - in this chapter, the read data model operations in HBase are Get and Scan. - </para> - <para>However, that doesn't mean that equivalent join functionality can't be supported in your application, but - you have to do it yourself. The two primary strategies are either denormalizing the data upon writing to HBase, - or to have lookup tables and do the join between HBase tables in your application or MapReduce code (and as RDBMS' - demonstrate, there are several strategies for this depending on the size of the tables, e.g., nested loops vs. - hash-joins). So which is the best approach? It depends on what you are trying to do, and as such there isn't a single - answer that works for every use case. - </para> - </section> - <section xml:id="acid"><title>ACID</title> - <pre>See <link xlink:href="http://hbase.apache.org/acid-semantics.html">ACID Semantics</link>. - Lars Hofhansl has also written a note on - <link xlink:href="http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html">ACID in HBase</link>.</pre> - </section> - </chapter> <!-- data model --> - - <!-- schema design --> - <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="schema_design.xml" /> - - <chapter xml:id="mapreduce"> - <title>HBase and MapReduce</title> - <para>See <link xlink:href="http://hbase.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description"> - HBase and MapReduce</link> up in javadocs. - Start there. Below is some additional help.</para> - <para>For more information about MapReduce (i.e., the framework in general), see the Hadoop site (TODO: Need good links here -- - we used to have some but they rotted against apache hadoop).</para> - <caution> - <title>Notice to Mapreduce users of HBase 0.96.1 and above</title> - <para>Some mapreduce jobs that use HBase fail to launch. The symptom is an - exception similar to the following: - <programlisting> -Exception in thread "main" java.lang.IllegalAccessError: class - com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass - com.google.protobuf.LiteralByteString - at java.lang.ClassLoader.defineClass1(Native Method) - at java.lang.ClassLoader.defineClass(ClassLoader.java:792) - at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) - at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) - at java.net.URLClassLoader.access$100(URLClassLoader.java:71) - at java.net.URLClassLoader$1.run(URLClassLoader.java:361) - at java.net.URLClassLoader$1.run(URLClassLoader.java:355) - at java.security.AccessController.doPrivileged(Native Method) - at java.net.URLClassLoader.findClass(URLClassLoader.java:354) - at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - at java.lang.ClassLoader.loadClass(ClassLoader.java:357) - at - org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) - at - org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) - at - org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) - at - org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) - at - org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) - at - org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) -... -</programlisting> - This is because of an optimization introduced in <link - xlink:href="https://issues.apache.org/jira/browse/HBASE-9867">HBASE-9867</link> - that inadvertently introduced a classloader dependency. - </para> - <para>This affects both jobs using the <code>-libjars</code> option and - "fat jar," those which package their runtime dependencies in a nested - <code>lib</code> folder.</para> - <para>In order to satisfy the new classloader requirements, - hbase-protocol.jar must be included in Hadoop's classpath. This can be - resolved system-wide by including a reference to the hbase-protocol.jar in - hadoop's lib directory, via a symlink or by copying the jar into the new - location.</para> - <para>This can also be achieved on a per-job launch basis by including it - in the <code>HADOOP_CLASSPATH</code> environment variable at job submission - time. When launching jobs that package their dependencies, all three of the - following job launching commands satisfy this requirement:</para> -<programlisting> -$ HADOOP_CLASSPATH=/path/to/hbase-protocol.jar:/path/to/hbase/conf hadoop jar MyJob.jar MyJobMainClass -$ HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase/conf hadoop jar MyJob.jar MyJobMainClass -$ HADOOP_CLASSPATH=$(hbase classpath) hadoop jar MyJob.jar MyJobMainClass -</programlisting> - <para>For jars that do not package their dependencies, the following command - structure is necessary:</para> -<programlisting> -$ HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp.jar MyJobMainClass -libjars $(hbase mapredcp | tr ':' ',') ... -</programlisting> - <para>See also <link - xlink:href="https://issues.apache.org/jira/browse/HBASE-10304">HBASE-10304</link> - for further discussion of this issue.</para> - </caution> - <section xml:id="splitter"> - <title>Map-Task Splitting</title> - <section xml:id="splitter.default"> - <title>The Default HBase MapReduce Splitter</title> - <para>When <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html">TableInputFormat</link> - is used to source an HBase table in a MapReduce job, - its splitter will make a map task for each region of the table. - Thus, if there are 100 regions in the table, there will be - 100 map-tasks for the job - regardless of how many column families are selected in the Scan.</para> - </section> - <section xml:id="splitter.custom"> - <title>Custom Splitters</title> - <para>For those interested in implementing custom splitters, see the method <code>getSplits</code> in - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html">TableInputFormatBase</link>. - That is where the logic for map-task assignment resides. - </para> - </section> - </section> - <section xml:id="mapreduce.example"> - <title>HBase MapReduce Examples</title> - <section xml:id="mapreduce.example.read"> - <title>HBase MapReduce Read Example</title> - <para>The following is an example of using HBase as a MapReduce source in read-only manner. Specifically, - there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper. There job would be defined - as follows... - <programlisting> -Configuration config = HBaseConfiguration.create(); -Job job = new Job(config, "ExampleRead"); -job.setJarByClass(MyReadJob.class); // class that contains mapper - -Scan scan = new Scan(); -scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs -scan.setCacheBlocks(false); // don't set to true for MR jobs -// set other scan attrs -... - -TableMapReduceUtil.initTableMapperJob( - tableName, // input HBase table name - scan, // Scan instance to control CF and attribute selection - MyMapper.class, // mapper - null, // mapper output key - null, // mapper output value - job); -job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapper - -boolean b = job.waitForCompletion(true); -if (!b) { - throw new IOException("error with job!"); -} - </programlisting> - ...and the mapper instance would extend <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>... - <programlisting> -public static class MyMapper extends TableMapper<Text, Text> { - - public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException { - // process data for the row from the Result instance. - } -} - </programlisting> - </para> - </section> - <section xml:id="mapreduce.example.readwrite"> - <title>HBase MapReduce Read/Write Example</title> - <para>The following is an example of using HBase both as a source and as a sink with MapReduce. - This example will simply copy data from one table to another. - <programlisting> -Configuration config = HBaseConfiguration.create(); -Job job = new Job(config,"ExampleReadWrite"); -job.setJarByClass(MyReadWriteJob.class); // class that contains mapper - -Scan scan = new Scan(); -scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs -scan.setCacheBlocks(false); // don't set to true for MR jobs -// set other scan attrs - -TableMapReduceUtil.initTableMapperJob( - sourceTable, // input table - scan, // Scan instance to control CF and attribute selection - MyMapper.class, // mapper class - null, // mapper output key - null, // mapper output value - job); -TableMapReduceUtil.initTableReducerJob( - targetTable, // output table - null, // reducer class - job); -job.setNumReduceTasks(0); - -boolean b = job.waitForCompletion(true); -if (!b) { - throw new IOException("error with job!"); -} - </programlisting> - An explanation is required of what <classname>TableMapReduceUtil</classname> is doing, especially with the reducer. - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link> is being used - as the outputFormat class, and several parameters are being set on the config (e.g., TableOutputFormat.OUTPUT_TABLE), as - well as setting the reducer output key to <classname>ImmutableBytesWritable</classname> and reducer value to <classname>Writable</classname>. - These could be set by the programmer on the job and conf, but <classname>TableMapReduceUtil</classname> tries to make things easier. - <para>The following is the example mapper, which will create a <classname>Put</classname> and matching the input <classname>Result</classname> - and emit it. Note: this is what the CopyTable utility does. - </para> - <programlisting> -public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put> { - - public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { - // this example is just copying the data from the source table... - context.write(row, resultToPut(row,value)); - } - - private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException { - Put put = new Put(key.get()); - for (KeyValue kv : result.raw()) { - put.add(kv); - } - return put; - } -} - </programlisting> - <para>There isn't actually a reducer step, so <classname>TableOutputFormat</classname> takes care of sending the <classname>Put</classname> - to the target table. - </para> - <para>This is just an example, developers could choose not to use <classname>TableOutputFormat</classname> and connect to the - target table themselves. - </para> - </para> - </section> - <section xml:id="mapreduce.example.readwrite.multi"> - <title>HBase MapReduce Read/Write Example With Multi-Table Output</title> - <para>TODO: example for <classname>MultiTableOutputFormat</classname>. - </para> - </section> - <section xml:id="mapreduce.example.summary"> - <title>HBase MapReduce Summary to HBase Example</title> - <para>The following example uses HBase as a MapReduce source and sink with a summarization step. This example will - count the number of distinct instances of a value in a table and write those summarized counts in another table. - <programlisting> -Configuration config = HBaseConfiguration.create(); -Job job = new Job(config,"ExampleSummary"); -job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducer - -Scan scan = new Scan(); -scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs -scan.setCacheBlocks(false); // don't set to true for MR jobs -// set other scan attrs - -TableMapReduceUtil.initTableMapperJob( - sourceTable, // input table - scan, // Scan instance to control CF and attribute selection - MyMapper.class, // mapper class - Text.class, // mapper output key - IntWritable.class, // mapper output value - job); -TableMapReduceUtil.initTableReducerJob( - targetTable, // output table - MyTableReducer.class, // reducer class - job); -job.setNumReduceTasks(1); // at least one, adjust as required - -boolean b = job.waitForCompletion(true); -if (!b) { - throw new IOException("error with job!"); -} - </programlisting> - In this example mapper a column with a String-value is chosen as the value to summarize upon. - This value is used as the key to emit from the mapper, and an <classname>IntWritable</classname> represents an instance counter. - <programlisting> -public static class MyMapper extends TableMapper<Text, IntWritable> { - public static final byte[] CF = "cf".getBytes(); - public static final byte[] ATTR1 = "attr1".getBytes(); - - private final IntWritable ONE = new IntWritable(1); - private Text text = new Text(); - - public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { - String val = new String(value.getValue(CF, ATTR1)); - text.set(val); // we can only emit Writables... - - context.write(text, ONE); - } -} - </programlisting> - In the reducer, the "ones" are counted (just like any other MR example that does this), and then emits a <classname>Put</classname>. - <programlisting> -public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> { - public static final byte[] CF = "cf".getBytes(); - public static final byte[] COUNT = "count".getBytes(); - - public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { - int i = 0; - for (IntWritable val : values) { - i += val.get(); - } - Put put = new Put(Bytes.toBytes(key.toString())); - put.add(CF, COUNT, Bytes.toBytes(i)); - - context.write(null, put); - } -} - </programlisting> - </para> - </section> - <section xml:id="mapreduce.example.summary.file"> - <title>HBase MapReduce Summary to File Example</title> - <para>This very similar to the summary example above, with exception that this is using HBase as a MapReduce source - but HDFS as the sink. The differences are in the job setup and in the reducer. The mapper remains the same. - </para> - <programlisting> -Configuration config = HBaseConfiguration.create(); -Job job = new Job(config,"ExampleSummaryToFile"); -job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducer - -Scan scan = new Scan(); -scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs -scan.setCacheBlocks(false); // don't set to true for MR jobs -// set other scan attrs - -TableMapReduceUtil.initTableMapperJob( - sourceTable, // input table - scan, // Scan instance to control CF and attribute selection - MyMapper.class, // mapper class - Text.class, // mapper output key - IntWritable.class, // mapper output value - job); -job.setReducerClass(MyReducer.class); // reducer class -job.setNumReduceTasks(1); // at least one, adjust as required -FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile")); // adjust directories as required - -boolean b = job.waitForCompletion(true); -if (!b) { - throw new IOException("error with job!"); -} - </programlisting> - As stated above, the previous Mapper can run unchanged with this example. - As for the Reducer, it is a "generic" Reducer instead of extending TableMapper and emitting Puts. - <programlisting> - public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { - - public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { - int i = 0; - for (IntWritable val : values) { - i += val.get(); - } - context.write(key, new IntWritable(i)); - } -} - </programlisting> - </section> - <section xml:id="mapreduce.example.summary.noreducer"> - <title>HBase MapReduce Summary to HBase Without Reducer</title> - <para>It is also possible to perform summaries without a reducer - if you use HBase as the reducer. - </para> - <para>An HBase target table would need to exist for the job summary. The HTable method <code>incrementColumnValue</code> - would be used to atomically increment values. From a performance perspective, it might make sense to keep a Map - of values with their values to be incremeneted for each map-task, and make one update per key at during the <code> - cleanup</code> method of the mapper. However, your milage may vary depending on the number of rows to be processed and - unique keys. - </para> - <para>In the end, the summary results are in HBase. - </para> - </section> - <section xml:id="mapreduce.example.summary.rdbms"> - <title>HBase MapReduce Summary to RDBMS</title> - <para>Sometimes it is more appropriate to generate summaries to an RDBMS. For these cases, it is possible - to generate summaries directly to an RDBMS via a custom reducer. The <code>setup</code> method - can connect to an RDBMS (the connection information can be passed via custom parameters in the context) and the - cleanup method can close the connection. - </para> - <para>It is critical to understand that number of reducers for the job affects the summarization implementation, and - you'll have to design this into your reducer. Specifically, whether it is designed to run as a singleton (one reducer) - or multiple reducers. Neither is right or wrong, it depends on your use-case. Recognize that the more reducers that - are assigned to the job, the more simultaneous connections to the RDBMS will be created - this will scale, but only to a point. - </para> - <programlisting> - public static class MyRdbmsReducer extends Reducer<Text, IntWritable, Text, IntWritable> { - - private Connection c = null; - - public void setup(Context context) { - // create DB connection... - } - - public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { - // do summarization - // in this example the keys are Text, but this is just an example - } - - public void cleanup(Context context) { - // close db connection - } - -} - </programlisting> - <para>In the end, the summary results are written to your RDBMS table/s. - </para> - </section> - - </section> <!-- mr examples --> - <section xml:id="mapreduce.htable.access"> - <title>Accessing Other HBase Tables in a MapReduce Job</title> - <para>Although the framework currently allows one HBase table as input to a - MapReduce job, other HBase tables can - be accessed as lookup tables, etc., in a - MapReduce job via creating an HTable instance in the setup method of the Mapper. - <programlisting>public class MyMapper extends TableMapper<Text, LongWritable> { - private HTable myOtherTable; - - public void setup(Context context) { - myOtherTable = new HTable("myOtherTable"); - } - - public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { - // process Result... - // use 'myOtherTable' for lookups - } - - </programlisting> - </para> - </section> - <section xml:id="mapreduce.specex"> - <title>Speculative Execution</title> - <para>It is generally advisable to turn off speculative execution for - MapReduce jobs that use HBase as a source. This can either be done on a - per-Job basis through properties, on on the entire cluster. Especially - for longer running jobs, speculative execution will create duplicate - map-tasks which will double-write your data to HBase; this is probably - not what you want. - </para> - <para>See <xref linkend="spec.ex"/> for more information. - </para> - </section> - </chapter> <!-- mapreduce --> - - <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="security.xml" /> - - <chapter xml:id="architecture"> - <title>Architecture</title> - <section xml:id="arch.overview"> - <title>Overview</title> - <section xml:id="arch.overview.nosql"> - <title>NoSQL?</title> - <para>HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which - supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an - example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking, - HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS, - such as typed columns, secondary indexes, triggers, and advanced query languages, etc. - </para> - <para>However, HBase has many features which supports both linear and modular scaling. HBase clusters expand - by adding RegionServers that are hosted on commodity class servers. If a cluster expands from 10 to 20 - RegionServers, for example, it doubles both in terms of storage and as well as processing capacity. - RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best - performance requires specialized hardware and storage devices. HBase features of note are: - <itemizedlist> - <listitem>Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This - makes it very suitable for tasks such as high-speed counter aggregation. </listitem> - <listitem>Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are - automatically split and re-distributed as your data grows.</listitem> - <listitem>Automatic RegionServer failover</listitem> - <listitem>Hadoop/HDFS Integration: HBase supports HDFS out of the box as its distributed file system.</listitem> - <listitem>MapReduce: HBase supports massively parallelized processing via MapReduce for using HBase as both - source and sink.</listitem> - <listitem>Java Client API: HBase supports an easy to use Java API for programmatic access.</listitem> - <listitem>Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends.</listitem> - <listitem>Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization.</listitem> - <listitem>Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.</listitem> - </itemizedlist> - </para> - </section> - - <section xml:id="arch.overview.when"> - <title>When Should I Use HBase?</title> - <para>HBase isn't suitable for every problem.</para> - <para>First, make sure you have enough data. If you have hundreds of millions or billions of rows, then - HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS - might be a better choice due to the fact that all of your data might wind up on a single node (or two) and - the rest of the cluster may be sitting idle. - </para> - <para>Second, make sure you can live without all the extra features that an RDBMS provides (e.g., typed columns, - secondary indexes, transactions, advanced query languages, etc.) An application built against an RDBMS cannot be - "ported" to HBase by simply changing a JDBC driver, for example. Consider moving from an RDBMS to HBase as a - complete redesign as opposed to a port. - </para> - <para>Third, make sure you have enough hardware. Even HDFS doesn't do well with anything less than - 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode. - </para> - <para>HBase can run quite well stand-alone on a laptop - but this should be considered a development - configuration only. - </para> - </section> - <section xml:id="arch.overview.hbasehdfs"> - <title>What Is The Difference Between HBase and Hadoop/HDFS?</title> - <para><link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files. - It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. - HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. - This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist - on HDFS for high-speed lookups. See the <xref linkend="datamodel" /> and the rest of this chapter for more information on how HBase achieves its goals. - </para> - </section> - </section> - - <section xml:id="arch.catalog"> - <title>Catalog Tables</title> - <para>The catalog tables -ROOT- and .META. exist as HBase tables. They are filtered out - of the HBase shell's <code>list</code> command, but they are in fact tables just like any other. - </para> - <section xml:id="arch.catalog.root"> - <title>ROOT</title> - <para>-ROOT- keeps track of where the .META. table is. The -ROOT- table structure is as follows: - </para> - <para>Key: - <itemizedlist> - <listitem>.META. region key (<code>.META.,,1</code>)</listitem> - </itemizedlist> - </para> - <para>Values: - <itemizedlist> - <listitem><code>info:regioninfo</code> (serialized <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html">HRegionInfo</link> - instance of .META.)</listitem> - <listitem><code>info:server</code> (server:port of the RegionServer holding .META.)</listitem> - <listitem><code>info:serverstartcode</code> (start-time of the RegionServer process holding .META.)</listitem> - </itemizedlist> - </para> - </section> - <section xml:id="arch.catalog.meta"> - <title>META</title> - <para>The .META. table keeps a list of all regions in the system. The .META. table structure is as follows: - </para> - <para>Key: - <itemizedlist> - <listitem>Region key of the format (<code>[table],[region start key],[region id]</code>)</listitem> - </itemizedlist> - </para> - <para>Values: - <itemizedlist> - <listitem><code>info:regioninfo</code> (serialized <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html"> - HRegionInfo</link> instance for this region) - </listitem> - <listitem><code>info:server</code> (server:port of the RegionServer containing this region)</listitem> - <listitem><code>info:serverstartcode</code> (start-time of the RegionServer process containing this region)</listitem> - </itemizedlist> - </para> - <para>When a table is in the process of splitting two other columns will be created, <code>info:splitA</code> and <code>info:splitB</code> - which represent the two daughter regions. The values for these columns are also serialized HRegionInfo instances. - After the region has been split eventually this row will be deleted. - </para> - <para>Notes on HRegionInfo: the empty key is used to denote table start and table end. A region with an empty start key - is the first region in a table. If region has both an empty start and an empty end key, it's the only region in the table - </para> - <para>In the (hopefully unlikely) event that programmatic processing of catalog metadata is required, see the - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29">Writables</link> utility. - </para> - </section> - <section xml:id="arch.catalog.startup"> - <title>Startup Sequencing</title> - <para>The META location is set in ROOT first. Then META is updated with server and startcode values. - </para> - <para>For information on region-RegionServer assignment, see <xref linkend="regions.arch.assignment"/>. - </para> - </section> - </section> <!-- catalog --> - - <section xml:id="client"> - <title>Client</title> - <para>The HBase client - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> - is responsible for finding RegionServers that are serving the - particular row range of interest. It does this by querying - the <code>.META.</code> and <code>-ROOT-</code> catalog tables - (TODO: Explain). After locating the required - region(s), the client <emphasis>directly</emphasis> contacts - the RegionServer serving that region (i.e., it does not go - through the master) and issues the read or write request. - This information is cached in the client so that subsequent requests - need not go through the lookup process. Should a region be reassigned - either by the master load balancer or because a RegionServer has died, - the client will requery the catalog tables to determine the new - location of the user region. - </para> - <para>See <xref linkend="master.runtime"/> for more information about the impact of the Master on HBase Client - communication. - </para> - <para>Administrative functions are handled through <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> - </para> - <section xml:id="client.connections"><title>Connections</title> - <para>For connection configuration information, see <xref linkend="client_dependencies" />. - </para> - <para><emphasis><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> - instances are not thread-safe</emphasis>. Only one thread use an instance of HTable at any given - time. When creating HTable instances, it is advisable to use the same <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration">HBaseConfiguration</link> -instance. This will ensure sharing of ZooKeeper and socket instances to the RegionServers -which is usually what you want. For example, this is preferred: - <programlisting>HBaseConfiguration conf = HBaseConfiguration.create(); -HTable table1 = new HTable(conf, "myTable"); -HTable table2 = new HTable(conf, "myTable");</programlisting> - as opposed to this: - <programlisting>HBaseConfiguration conf1 = HBaseConfiguration.create(); -HTable table1 = new HTable(conf1, "myTable"); -HBaseConfiguration conf2 = HBaseConfiguration.create(); -HTable table2 = new HTable(conf2, "myTable");</programlisting> - For more information about how connections are handled in the HBase client, - see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html">HConnectionManager</link>. - </para> - <section xml:id="client.connection.pooling"><title>Connection Pooling</title> - <para>For applications which require high-end multithreaded access (e.g., web-servers or application servers that may serve many application threads - in a single JVM), one solution is <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html">HTablePool</link>. - But as written currently, it is difficult to control client resource consumption when using HTablePool. - </para> - <para> - Another solution is to precreate an <classname>HConnection</classname> using - <programlisting>// Create a connection to the cluster. -HConnection connection = HConnectionManager.createConnection(Configuration); -HTableInterface table = connection.getTable("myTable"); -// use table as needed, the table returned is lightweight -table.close(); -// use the connection for other access to the cluster -connection.close();</programlisting> - Constructing HTableInterface implementation is very lightweight and resources are controlled/shared if you go this route. - </para> - </section> - </section> - <section xml:id="client.writebuffer"><title>WriteBuffer and Batch Methods</title> - <para>If <xref linkend="perf.hbase.client.autoflush" /> is turned off on - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>, - <classname>Put</classname>s are sent to RegionServers when the writebuffer - is filled. The writebuffer is 2MB by default. Before an HTable instance is - discarded, either <methodname>close()</methodname> or - <methodname>flushCommits()</methodname> should be invoked so Puts - will not be lost. - </para> - <para>Note: <code>htable.delete(Delete);</code> does not go in the writebuffer! This only applies to Puts. - </para> - <para>For additional information on write durability, review the <link xlink:href="acid-semantics.html">ACID semantics</link> page. - </para> - <para>For fine-grained control of batching of - <classname>Put</classname>s or <classname>Delete</classname>s, - see the <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">batch</link> methods on HTable. - </para> - </section> - <section xml:id="client.external"><title>External Clients</title> - <para>Information on non-Java clients and custom protocols is covered in <xref linkend="external_apis" /> - </para> - </section> - </section> - - <section xml:id="client.filter"><title>Client Request Filters</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> and <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> instances can be - optionally configured with <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html">filters</link> which are applied on the RegionServer. - </para> - <para>Filters can be confusing because there are many different types, and it is best to approach them by understanding the groups - of Filter functionality. - </para> - <section xml:id="client.filter.structural"><title>Structural</title> - <para>Structural Filters contain other Filters.</para> - <section xml:id="client.filter.structural.fl"><title>FilterList</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html">FilterList</link> - represents a list of Filters with a relationship of <code>FilterList.Operator.MUST_PASS_ALL</code> or - <code>FilterList.Operator.MUST_PASS_ONE</code> between the Filters. The following example shows an 'or' between two - Filters (checking for either 'my value' or 'my other value' on the same attribute). -<programlisting> -FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE); -SingleColumnValueFilter filter1 = new SingleColumnValueFilter( - cf, - column, - CompareOp.EQUAL, - Bytes.toBytes("my value") - ); -list.add(filter1); -SingleColumnValueFilter filter2 = new SingleColumnValueFilter( - cf, - column, - CompareOp.EQUAL, - Bytes.toBytes("my other value") - ); -list.add(filter2); -scan.setFilter(list); -</programlisting> - </para> - </section> - </section> - <section xml:id="client.filter.cv"><title>Column Value</title> - <section xml:id="client.filter.cv.scvf"><title>SingleColumnValueFilter</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html">SingleColumnValueFilter</link> - can be used to test column values for equivalence (<code><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html">CompareOp.EQUAL</link> - </code>), inequality (<code>CompareOp.NOT_EQUAL</code>), or ranges - (e.g., <code>CompareOp.GREATER</code>). The folowing is example of testing equivalence a column to a String value "my value"... -<programlisting> -SingleColumnValueFilter filter = new SingleColumnValueFilter( - cf, - column, - CompareOp.EQUAL, - Bytes.toBytes("my value") - ); -scan.setFilter(filter); -</programlisting> - </para> - </section> - </section> - <section xml:id="client.filter.cvp"><title>Column Value Comparators</title> - <para>There are several Comparator classes in the Filter package that deserve special mention. - These Comparators are used in concert with other Filters, such as <xref linkend="client.filter.cv.scvf" />. - </para> - <section xml:id="client.filter.cvp.rcs"><title>RegexStringComparator</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html">RegexStringComparator</link> - supports regular expressions for value comparisons. -<programlisting> -RegexStringComparator comp = new RegexStringComparator("my."); // any value that starts with 'my' -SingleColumnValueFilter filter = new SingleColumnValueFilter( - cf, - column, - CompareOp.EQUAL, - comp - ); -scan.setFilter(filter); -</programlisting> - See the Oracle JavaDoc for <link xlink:href="http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html">supported RegEx patterns in Java</link>. - </para> - </section> - <section xml:id="client.filter.cvp.rcs"><title>SubstringComparator</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html">SubstringComparator</link> - can be used to determine if a given substring exists in a value. The comparison is case-insensitive. - </para> -<programlisting> -SubstringComparator comp = new SubstringComparator("y val"); // looking for 'my value' -SingleColumnValueFilter filter = new SingleColumnValueFilter( - cf, - column, - CompareOp.EQUAL, - comp - ); -scan.setFilter(filter); -</programlisting> - </section> - <section xml:id="client.filter.cvp.bfp"><title>BinaryPrefixComparator</title> - <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html">BinaryPrefixComparator</link>.</para> - </section> - <section xml:id="client.filter.cvp.bc"><title>BinaryComparator</title> - <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html">BinaryComparator</link>.</para> - </section> - </section> - <section xml:id="client.filter.kvm"><title>KeyValue Metadata</title> - <para>As HBase stores data internally as KeyValue pairs, KeyValue Metadata Filters evaluate the existence of keys (i.e., ColumnFamily:Column qualifiers) - for a row, as opposed to values the previous section. - </para> - <section xml:id="client.filter.kvm.ff"><title>FamilyFilter</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html">FamilyFilter</link> can be used - to filter on the ColumnFamily. It is generally a better idea to select ColumnFamilies in the Scan than to do it with a Filter.</para> - </section> - <section xml:id="client.filter.kvm.qf"><title>QualifierFilter</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html">QualifierFilter</link> can be used - to filter based on Column (aka Qualifier) name. - </para> - </section> - <section xml:id="client.filter.kvm.cpf"><title>ColumnPrefixFilter</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html">ColumnPrefixFilter</link> can be used - to filter based on the lead portion of Column (aka Qualifier) names. - </para> - <para>A ColumnPrefixFilter seeks ahead to the first column matching the prefix in each row and for each involved column family. It can be used to efficiently - get a subset of the columns in very wide rows. - </para> - <para>Note: The same column qualifier can be used in different column families. This filter returns all matching columns. - </para> - <para>Example: Find all columns in a row and family that start with "abc" -<programlisting> -HTableInterface t = ...; -byte[] row = ...; -byte[] family = ...; -byte[] prefix = Bytes.toBytes("abc"); -Scan scan = new Scan(row, row); // (optional) limit to one row -scan.addFamily(family); // (optional) limit to one family -Filter f = new ColumnPrefixFilter(prefix); -scan.setFilter(f); -scan.setBatch(10); // set this if there could be many columns returned -ResultScanner rs = t.getScanner(scan); -for (Result r = rs.next(); r != null; r = rs.next()) { - for (KeyValue kv : r.raw()) { - // each kv represents a column - } -} -rs.close(); -</programlisting> -</para> - </section> - <section xml:id="client.filter.kvm.mcpf"><title>MultipleColumnPrefixFilter</title> - <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html">MultipleColumnPrefixFilter</link> behaves like ColumnPrefixFilter - but allows specifying multiple prefixes. - </para> - <para>Like ColumnPrefixFilter, MultipleColumnPrefixFilter efficiently seeks ahead to the first column matching the lowest prefix and also seeks past ranges of columns between prefixes. - It can be used to efficiently get discontinuous sets of columns from very wide rows. - </para> - <para>Example: Find all columns in a row and family that start with "abc" or "xyz" -<programlisting> -HTableInterface t = ...; -byte[] row = ...; -byte[] family = ...; -byte[][] prefixes = new byte[][] {Bytes.toBytes("abc"), Bytes.toBytes("xyz")}; -Scan scan = new Scan(row, row); // (optional) limit to one row -scan.addFamily(family); // (optional) limit to one family -Filter f = new MultipleColumnPrefixFilter(prefixes); -scan.setFilter(f); -scan.setBatch(10); // set this if there could be many columns returned -ResultScanner rs = t.getScanner(scan); -for (Result r = rs.next(); r != null; r = rs.next()) { - for (KeyValue kv : r.raw()) { - // each kv represents a column - } -} -rs.close(); -</programlisting> -</para> - </section> - <section xml:id="client.filter.kvm.crf "><title>ColumnRangeFilter</title> - <para>A <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html">ColumnRangeFilter</link> allows efficient intra row scanning. - </para> - <para>A ColumnRangeFilter can seek ahead to the first matching column for each involved column family. It can be used to efficiently - get a 'slice' of the columns of a very wide row. - i.e. you have a million columns in a row but you only want to look at columns bbbb-bbdd. - </para> - <para>Note: The same column qualifier can be used in different column families. This filter returns all matching columns. - </para> - <para>Example: Find all columns in a row and family between "bbbb" (inclusive) and "bbdd" (inclusive) -<programlisting> -HTableInterface t = ...; -byte[] row = ...; -byte[] family = ...; -byte[] startColumn = Bytes.toBytes("bbbb"); -byte[] endColumn = Bytes.toBytes("bbdd"); -Scan scan = new Scan(row, row); // (optional) limit to one row -scan.addFamily(family); // (optional) limit to one family -Filter f = new ColumnRangeFilter(startColumn, true, endColumn, true); -scan.setFilter(f); -scan.setBatch(10); // set this if there could be many columns returned -ResultScanner rs = t.getScanner(scan); -for (Result r = rs.next(); r != null; r = rs.next()) { - for (KeyValue kv : r.raw()) { - // each kv represents a column - } -} -rs.close(); -</programlisting> -</para> - <para>Note: Introduced in HBase 0.92</para> - </section> - </section> - <section xml:id="client.filter.row"><title>RowKey</title> - <section xml:id="client.filter.row.rf"><title>RowFilter</title> - <para>It is generally a better idea to use the startRow/stopRow methods on Scan for row selection, however - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html">RowFilter</link> can also be used.</para> - </section> - </section> - <section xml:id="client.filter.utility"><title>Utility</title> - <section xml:id="client.filter.utility.fkof"><title>FirstKeyOnlyFilter</title> - <para>This is primarily used for rowcount jobs. - See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html">FirstKeyOnlyFilter</link>.</para> - </section> - </section> - </section> <!-- client.filter --> - - <section xml:id="master"><title>Master</title> - <para><code>HMaster</code> is the implementation of the Master Server. The Master server - is responsible for monitoring all RegionServer instances in the cluster, and is - the interface for all metadata changes. In a distributed cluster, the Master typically runs on the <xref linkend="arch.hdfs.nn" /><footnote> - <para>J Mohamed Zahoor goes into some more detail on the Master Architecture in this blog posting, <link - xlink:href="http://blog.zahoor.in/2012/08/hbase-hmaster-architecture/">HBase HMaster Architecture - </link>.</para> - </footnote> - </para> - <section xml:id="master.startup"><title>Startup Behavior</title> - <para>If run in a multi-Master environment, all Masters compete to run the cluster. If the active - Master loses its lease in ZooKeeper (or the Master shuts down), then then the remaining Masters jostle to - take over the Master role. - </para> - </section> - <section xml:id="master.runtime"><title>Runtime Impact</title> - <para>A common dist-list question is what happens to an HBase cluster when the Master goes down. Because the - HBase client talks directly to the RegionServers, the cluster can still function in a "steady - state." Additionally, per <xref linkend="arch.catalog"/> ROOT and META exist as HBase tables (i.e., are - not resident in the Master). However, the Master controls critical functions such as RegionServer failover and - completing region splits. So while the cluster can still run <emphasis>for a time</emphasis> without the Master, - the Master should be restarted as soon as possible. - </para> - </section> - <section xml:id="master.api"><title>Interface</title> - <para>The methods exposed by <code>HMasterInterface</code> are primarily metadata-oriented methods: - <itemizedlist> - <listitem>Table (createTable, modifyTable, removeTable, enable, disable) - </listitem> - <listitem>ColumnFamily (addColumn, modifyColumn, removeColumn) - </listitem> - <listitem>Region (move, assign, unassign) - </listitem> - </itemizedlist> - For example, when the <code>HBaseAdmin</code> method <code>disableTable</code> is invoked, it is serviced by the Master server. - </para> - </section> - <section xml:id="master.processes"><title>Processes</title> - <para>The Master runs several background threads: - </para> - <section xml:id="master.processes.loadbalancer"><title>LoadBalancer</title> - <para>Periodically, and when there are no regions in transition, - a load balancer will run and move regions around to balance the cluster's load. - See <xref linkend="balancer_config" /> for configuring this property.</para> - <para>See <xref linkend="regions.arch.assignment"/> for more information on region assignment. - </para> - </section> - <section xml:id="master.processes.catalog"><title>CatalogJanitor</title> - <para>Periodically checks and cleans up the .META. table. See <xref linkend="arch.catalog.meta" /> for more information on META.</para> - </section> - </section> - - </section> - <section xml:id="regionserver.arch"><title>RegionServer</title> - <para><code>HRegionServer</code> is the RegionServer implementation. It is responsible for serving and managing regions. - In a distributed cluster, a RegionServer runs on a <xref linkend="arch.hdfs.dn" />. - </para> - <section xml:id="regionserver.arch.api"><title>Interface</title> - <para>The methods exposed by <code>HRegionRegionInterface</code> contain both data-oriented and region-maintenance methods: - <itemizedlist> - <listitem>Data (get, put, delete, next, etc.) - </listitem> - <listitem>Region (splitRegion, compactRegion, etc.) - </listitem> - </itemizedlist> - For example, when the <code>HBaseAdmin</code> method <code>majorCompact</code> is invoked on a table, the client is actually iterating through - all regions for the specified table and requesting a major compaction directly to each region. - </para> - </section> - <section xml:id="regionserver.arch.processes"><title>Processes</title> - <para>The RegionServer runs a variety of background threads:</para> - <section xml:id="regionserver.arch.processes.compactsplit"><title>CompactSplitThread</title> - <para>Checks for splits and handle minor compactions.</para> - </section> - <section xml:id="regionserver.arch.processes.majorcompact"><title>MajorCompactionChecker</title> - <para>Checks for major compactions.</para> - </section> - <section xml:id="regionserver.arch.processes.memstore"><title>MemStoreFlusher</title> - <para>Periodically flushes in-memory writes in the MemStore to StoreFiles.</para> - </section> - <section xml:id="regionserver.arch.processes.log"><title>LogRoller</title> - <para>Periodically checks the RegionServer's HLog.</para> - </section> - </section> - - <section xml:id="coprocessors"><title>Coprocessors</title> - <para>Coprocessors were added in 0.92. There is a thorough <link xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction">Blog Overview of CoProcessors</link> - posted. Documentation will eventually move to this reference guide, but the blog is the most current information available at this time. - </para> - </section> - - <section xml:id="block.cache"> - <title>Block Cache</title> - <section xml:id="block.cache.design"> - <title>Design</title> - <para>The Block Cache is an LRU cache that contains three levels of block priority to allow for scan-resistance and in-memory ColumnFamilies: - </para> - <itemizedlist> - <listitem>Single access priority: The first time a block is loaded from HDFS it normally has this priority and it will be part of the first group to be considered - during evictions. The advantage is that scanned blocks are more likely to get evicted than blocks that are getting more usage. - </listitem> - <listitem>Mutli access priority: If a block in the previous priority group is accessed again, it upgrades to this priority. It is thus part of the second group - considered during evictions. - </listitem> - <listitem>In-memory access priority: If the block's family was configured to be "in-memory", it will be part of this priority disregarding the number of times it - was accessed. Catalog tables are configured like this. This group is the last one considered during evictions. - </listitem> - </itemizedlist> - <para> - For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/LruBlockCache.html">LruBlockCache source</link> - </para> - </section> - <section xml:id="block.cache.usage"> - <title>Usage</title> - <para>Block caching is enabled by default for all the user tables which means that any read operation will load the LRU cache. This might be good for a large number of use cases, - but further tunings are usually required in order to achieve better performance. An important concept is the - <link xlink:href="http://en.wikipedia.org/wiki/Working_set_size">working set size</link>, or WSS, which is: "the amount of memory needed to compute the answer to a problem". - For a website, this would be the data that's needed to answer the queries over a short amount of time. - </para> - <para>The way to calculate how much memory is available in HBase for caching is: - </para> - <programlisting> - number of region servers * heap size * hfile.block.cache.size * 0.85 - </programlisting> - <para>The default value for the block cache is 0.25 which represents 25% of the available heap. The last value (85%) is the default acceptable loading factor in the LRU cache after - which eviction is started. The reason it is included in this equation is that it would be unrealistic to say that it is possible to use 100% of the available memory since this would - make the process blocking from the point where it loads new blocks. Here are some examples: - </para> - <itemizedlist> - <listitem>One region server with the default heap size (1GB) and the default block cache size will have 217MB of block cache available. - </listitem> - <listitem>20 region servers with the heap size set to 8GB and a default block cache size will have 34GB of block cache. - </listitem> - <listitem>100 region servers with the heap size set to 24GB and a block cache size of 0.5 will have about 1TB of block cache. - </listitem> - </itemizedlist> - <para>Your data isn't the only resident of the block cache, here are others that you may have to take into account: - </para> - <itemizedlist> - <listitem>Catalog tables: The -ROOT- and .META. tables are forced into the block cache and have the in-memory priority which means that they are harder to evict. The former never uses - more than a few hundreds of bytes while the latter can occupy a few MBs (depending on the number of regions). - </listitem> - <listitem>HFiles indexes: HFile is the file format that HBase uses to store data in HDFS and it contains a multi-layered index in order seek to the data without having to read the whole file. - The size of those indexes is a factor of the block size (64KB by default), the size of your keys and the amoun
<TRUNCATED>
