http://git-wip-us.apache.org/repos/asf/hbase/blob/e80b3092/src/main/docbkx/architecture.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/architecture.xml b/src/main/docbkx/architecture.xml
deleted file mode 100644
index 16b298a..0000000
--- a/src/main/docbkx/architecture.xml
+++ /dev/null
@@ -1,3489 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter
-    xml:id="architecture"
-    version="5.0"
-    xmlns="http://docbook.org/ns/docbook";
-    xmlns:xlink="http://www.w3.org/1999/xlink";
-    xmlns:xi="http://www.w3.org/2001/XInclude";
-    xmlns:svg="http://www.w3.org/2000/svg";
-    xmlns:m="http://www.w3.org/1998/Math/MathML";
-    xmlns:html="http://www.w3.org/1999/xhtml";
-    xmlns:db="http://docbook.org/ns/docbook";>
-    <!--/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-
-    <title>Architecture</title>
-       <section xml:id="arch.overview">
-       <title>Overview</title>
-         <section xml:id="arch.overview.nosql">
-         <title>NoSQL?</title>
-         <para>HBase is a type of "NoSQL" database.  "NoSQL" is a general term 
meaning that the database isn't an RDBMS which
-         supports SQL as its primary access language, but there are many types 
of NoSQL databases:  BerkeleyDB is an
-         example of a local NoSQL database, whereas HBase is very much a 
distributed database.  Technically speaking,
-         HBase is really more a "Data Store" than "Data Base" because it lacks 
many of the features you find in an RDBMS,
-         such as typed columns, secondary indexes, triggers, and advanced 
query languages, etc.
-         </para>
-         <para>However, HBase has many features which supports both linear and 
modular scaling.  HBase clusters expand
-         by adding RegionServers that are hosted on commodity class servers. 
If a cluster expands from 10 to 20
-         RegionServers, for example, it doubles both in terms of storage and 
as well as processing capacity.
-         RDBMS can scale well, but only up to a point - specifically, the size 
of a single database server - and for the best
-         performance requires specialized hardware and storage devices.  HBase 
features of note are:
-               <itemizedlist>
-              <listitem><para>Strongly consistent reads/writes:  HBase is not 
an "eventually consistent" DataStore.  This
-              makes it very suitable for tasks such as high-speed counter 
aggregation.</para>  </listitem>
-              <listitem><para>Automatic sharding:  HBase tables are 
distributed on the cluster via regions, and regions are
-              automatically split and re-distributed as your data 
grows.</para></listitem>
-              <listitem><para>Automatic RegionServer failover</para></listitem>
-              <listitem><para>Hadoop/HDFS Integration:  HBase supports HDFS 
out of the box as its distributed file system.</para></listitem>
-              <listitem><para>MapReduce:  HBase supports massively 
parallelized processing via MapReduce for using HBase as both
-              source and sink.</para></listitem>
-              <listitem><para>Java Client API:  HBase supports an easy to use 
Java API for programmatic access.</para></listitem>
-              <listitem><para>Thrift/REST API:  HBase also supports Thrift and 
REST for non-Java front-ends.</para></listitem>
-              <listitem><para>Block Cache and Bloom Filters:  HBase supports a 
Block Cache and Bloom Filters for high volume query 
optimization.</para></listitem>
-              <listitem><para>Operational Management:  HBase provides build-in 
web-pages for operational insight as well as JMX metrics.</para></listitem>
-            </itemizedlist>
-         </para>
-      </section>
-
-         <section xml:id="arch.overview.when">
-           <title>When Should I Use HBase?</title>
-                 <para>HBase isn't suitable for every problem.</para>
-                 <para>First, make sure you have enough data.  If you have 
hundreds of millions or billions of rows, then
-                   HBase is a good candidate.  If you only have a few 
thousand/million rows, then using a traditional RDBMS
-                   might be a better choice due to the fact that all of your 
data might wind up on a single node (or two) and
-                   the rest of the cluster may be sitting idle.
-                 </para>
-                 <para>Second, make sure you can live without all the extra 
features that an RDBMS provides (e.g., typed columns,
-                 secondary indexes, transactions, advanced query languages, 
etc.)  An application built against an RDBMS cannot be
-                 "ported" to HBase by simply changing a JDBC driver, for 
example.  Consider moving from an RDBMS to HBase as a
-                 complete redesign as opposed to a port.
-              </para>
-                 <para>Third, make sure you have enough hardware.  Even HDFS 
doesn't do well with anything less than
-                5 DataNodes (due to things such as HDFS block replication 
which has a default of 3), plus a NameNode.
-                </para>
-                <para>HBase can run quite well stand-alone on a laptop - but 
this should be considered a development
-                configuration only.
-                </para>
-      </section>
-      <section xml:id="arch.overview.hbasehdfs">
-        <title>What Is The Difference Between HBase and Hadoop/HDFS?</title>
-          <para><link xlink:href="http://hadoop.apache.org/hdfs/";>HDFS</link> 
is a distributed file system that is well suited for the storage of large files.
-          Its documentation states that it is not, however, a general purpose 
file system, and does not provide fast individual record lookups in files.
-          HBase, on the other hand, is built on top of HDFS and provides fast 
record lookups (and updates) for large tables.
-          This can sometimes be a point of conceptual confusion.  HBase 
internally puts your data in indexed "StoreFiles" that exist
-          on HDFS for high-speed lookups.  See the <xref linkend="datamodel" 
/> and the rest of this chapter for more information on how HBase achieves its 
goals.
-         </para>
-      </section>
-       </section>
-
-    <section
-      xml:id="arch.catalog">
-      <title>Catalog Tables</title>
-      <para>The catalog table <code>hbase:meta</code> exists as an HBase table 
and is filtered out of the HBase
-        shell's <code>list</code> command, but is in fact a table just like 
any other. </para>
-      <section
-        xml:id="arch.catalog.root">
-        <title>-ROOT-</title>
-        <note>
-          <para>The <code>-ROOT-</code> table was removed in HBase 0.96.0. 
Information here should
-            be considered historical.</para>
-        </note>
-        <para>The <code>-ROOT-</code> table kept track of the location of the
-            <code>.META</code> table (the previous name for the table now 
called <code>hbase:meta</code>) prior to HBase
-          0.96. The <code>-ROOT-</code> table structure was as follows: </para>
-        <itemizedlist>
-          <title>Key</title>
-          <listitem>
-            <para>.META. region key (<code>.META.,,1</code>)</para>
-          </listitem>
-        </itemizedlist>
-
-        <itemizedlist>
-          <title>Values</title>
-          <listitem>
-            <para><code>info:regioninfo</code> (serialized <link
-                
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html";>HRegionInfo</link>
-              instance of hbase:meta)</para>
-          </listitem>
-          <listitem>
-            <para><code>info:server</code> (server:port of the RegionServer 
holding
-              hbase:meta)</para>
-          </listitem>
-          <listitem>
-            <para><code>info:serverstartcode</code> (start-time of the 
RegionServer process holding
-              hbase:meta)</para>
-          </listitem>
-        </itemizedlist>
-      </section>
-      <section
-        xml:id="arch.catalog.meta">
-        <title>hbase:meta</title>
-        <para>The <code>hbase:meta</code> table (previously called 
<code>.META.</code>) keeps a list
-          of all regions in the system. The location of 
<code>hbase:meta</code> was previously
-          tracked within the <code>-ROOT-</code> table, but is now stored in 
Zookeeper.</para>
-        <para>The <code>hbase:meta</code> table structure is as follows: 
</para>
-        <itemizedlist>
-          <title>Key</title>
-          <listitem>
-            <para>Region key of the format (<code>[table],[region start 
key],[region
-              id]</code>)</para>
-          </listitem>
-        </itemizedlist>
-        <itemizedlist>
-          <title>Values</title>
-          <listitem>
-            <para><code>info:regioninfo</code> (serialized <link
-                
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html";>
-                HRegionInfo</link> instance for this region)</para>
-          </listitem>
-          <listitem>
-            <para><code>info:server</code> (server:port of the RegionServer 
containing this
-              region)</para>
-          </listitem>
-          <listitem>
-            <para><code>info:serverstartcode</code> (start-time of the 
RegionServer process
-              containing this region)</para>
-          </listitem>
-        </itemizedlist>
-        <para>When a table is in the process of splitting, two other columns 
will be created, called
-            <code>info:splitA</code> and <code>info:splitB</code>. These 
columns represent the two
-          daughter regions. The values for these columns are also serialized 
HRegionInfo instances.
-          After the region has been split, eventually this row will be 
deleted. </para>
-        <note>
-          <title>Note on HRegionInfo</title>
-          <para>The empty key is used to denote table start and table end. A 
region with an empty
-            start key is the first region in a table. If a region has both an 
empty start and an
-            empty end key, it is the only region in the table </para>
-        </note>
-        <para>In the (hopefully unlikely) event that programmatic processing 
of catalog metadata is
-          required, see the <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29";>Writables</link>
-          utility. </para>
-      </section>
-      <section
-        xml:id="arch.catalog.startup">
-        <title>Startup Sequencing</title>
-        <para>First, the location of <code>hbase:meta</code> is looked up in 
Zookeeper. Next,
-          <code>hbase:meta</code> is updated with server and startcode 
values.</para>  
-        <para>For information on region-RegionServer assignment, see <xref
-            linkend="regions.arch.assignment" />. </para>
-      </section>
-    </section>  <!--  catalog -->
-
-    <section
-      xml:id="client">
-      <title>Client</title>
-      <para>The HBase client finds the RegionServers that are serving the 
particular row range of
-        interest. It does this by querying the <code>hbase:meta</code> table. 
See <xref
-          linkend="arch.catalog.meta" /> for details. After locating the 
required region(s), the
-        client contacts the RegionServer serving that region, rather than 
going through the master,
-        and issues the read or write request. This information is cached in 
the client so that
-        subsequent requests need not go through the lookup process. Should a 
region be reassigned
-        either by the master load balancer or because a RegionServer has died, 
the client will
-        requery the catalog tables to determine the new location of the user 
region. </para>
-
-      <para>See <xref
-          linkend="master.runtime" /> for more information about the impact of 
the Master on HBase
-        Client communication. </para>
-      <para>Administrative functions are done via an instance of <link
-          
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html";>Admin</link>
-      </para>
-
-      <section
-        xml:id="client.connections">
-        <title>Cluster Connections</title>
-        <para>The API changed in HBase 1.0. Its been cleaned up and users are 
returned
-          Interfaces to work against rather than particular types. In HBase 
1.0,
-          obtain a cluster Connection from ConnectionFactory and thereafter, 
get from it
-          instances of Table, Admin, and RegionLocator on an as-need basis. 
When done, close
-          obtained instances.  Finally, be sure to cleanup your Connection 
instance before
-          exiting.  Connections are heavyweight objects. Create once and keep 
an instance around.
-          Table, Admin and RegionLocator instances are lightweight. Create as 
you go and then
-          let go as soon as you are done by closing them. See the
-          <link 
xlink:href="/Users/stack/checkouts/hbase.git/target/site/apidocs/org/apache/hadoop/hbase/client/package-summary.html">Client
 Package Javadoc Description</link> for example usage of the new HBase 1.0 
API.</para>
-
-        <para>For connection configuration information, see <xref 
linkend="client_dependencies" />. </para>
-
-        <para><emphasis><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html";>Table</link>
-            instances are not thread-safe</emphasis>. Only one thread can use 
an instance of Table at
-          any given time. When creating Table instances, it is advisable to 
use the same <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration";>HBaseConfiguration</link>
-          instance. This will ensure sharing of ZooKeeper and socket instances 
to the RegionServers
-          which is usually what you want. For example, this is 
preferred:</para>
-          <programlisting language="java">HBaseConfiguration conf = 
HBaseConfiguration.create();
-HTable table1 = new HTable(conf, "myTable");
-HTable table2 = new HTable(conf, "myTable");</programlisting>
-          <para>as opposed to this:</para>
-          <programlisting language="java">HBaseConfiguration conf1 = 
HBaseConfiguration.create();
-HTable table1 = new HTable(conf1, "myTable");
-HBaseConfiguration conf2 = HBaseConfiguration.create();
-HTable table2 = new HTable(conf2, "myTable");</programlisting>
-
-        <para>For more information about how connections are handled in the 
HBase client,
-        see <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html";>HConnectionManager</link>.
-          </para>
-          <section xml:id="client.connection.pooling"><title>Connection 
Pooling</title>
-            <para>For applications which require high-end multithreaded access 
(e.g., web-servers or application servers that may serve many application 
threads
-            in a single JVM), you can pre-create an 
<classname>HConnection</classname>, as shown in
-              the following example:</para>
-            <example>
-              <title>Pre-Creating a <code>HConnection</code></title>
-              <programlisting language="java">// Create a connection to the 
cluster.
-HConnection connection = HConnectionManager.createConnection(Configuration);
-HTableInterface table = connection.getTable("myTable");
-// use table as needed, the table returned is lightweight
-table.close();
-// use the connection for other access to the cluster
-connection.close();</programlisting>
-            </example>
-          <para>Constructing HTableInterface implementation is very 
lightweight and resources are
-            controlled.</para>
-            <warning>
-              <title><code>HTablePool</code> is Deprecated</title>
-              <para>Previous versions of this guide discussed 
<code>HTablePool</code>, which was
-                deprecated in HBase 0.94, 0.95, and 0.96, and removed in 
0.98.1, by <link
-                  
xlink:href="https://issues.apache.org/jira/browse/HBASE-6580";>HBASE-6500</link>.
-                Please use <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnection.html";><code>HConnection</code></link>
 instead.</para>
-            </warning>
-          </section>
-         </section>
-          <section xml:id="client.writebuffer"><title>WriteBuffer and Batch 
Methods</title>
-           <para>If <xref linkend="perf.hbase.client.autoflush" /> is turned 
off on
-               <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html";>HTable</link>,
-               <classname>Put</classname>s are sent to RegionServers when the 
writebuffer
-               is filled.  The writebuffer is 2MB by default.  Before an 
HTable instance is
-               discarded, either <methodname>close()</methodname> or
-               <methodname>flushCommits()</methodname> should be invoked so 
Puts
-               will not be lost.
-             </para>
-             <para>Note: <code>htable.delete(Delete);</code> does not go in 
the writebuffer!  This only applies to Puts.
-             </para>
-             <para>For additional information on write durability, review the 
<link xlink:href="../acid-semantics.html">ACID semantics</link> page.
-             </para>
-       <para>For fine-grained control of batching of
-           <classname>Put</classname>s or <classname>Delete</classname>s,
-           see the <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29";>batch</link>
 methods on HTable.
-          </para>
-          </section>
-          <section xml:id="client.external"><title>External Clients</title>
-           <para>Information on non-Java clients and custom protocols is 
covered in <xref linkend="external_apis" />
-           </para>
-               </section>
-       </section>
-
-    <section xml:id="client.filter"><title>Client Request Filters</title>
-      <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
 and <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
 instances can be
-       optionally configured with <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html";>filters</link>
 which are applied on the RegionServer.
-      </para>
-      <para>Filters can be confusing because there are many different types, 
and it is best to approach them by understanding the groups
-      of Filter functionality.
-      </para>
-      <section xml:id="client.filter.structural"><title>Structural</title>
-        <para>Structural Filters contain other Filters.</para>
-        <section xml:id="client.filter.structural.fl"><title>FilterList</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html";>FilterList</link>
-          represents a list of Filters with a relationship of 
<code>FilterList.Operator.MUST_PASS_ALL</code> or
-          <code>FilterList.Operator.MUST_PASS_ONE</code> between the Filters.  
The following example shows an 'or' between two
-          Filters (checking for either 'my value' or 'my other value' on the 
same attribute).</para>
-<programlisting language="java">
-FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);
-SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       Bytes.toBytes("my value")
-       );
-list.add(filter1);
-SingleColumnValueFilter filter2 = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       Bytes.toBytes("my other value")
-       );
-list.add(filter2);
-scan.setFilter(list);
-</programlisting>
-        </section>
-      </section>
-      <section
-        xml:id="client.filter.cv">
-        <title>Column Value</title>
-        <section
-          xml:id="client.filter.cv.scvf">
-          <title>SingleColumnValueFilter</title>
-          <para><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html";>SingleColumnValueFilter</link>
-            can be used to test column values for equivalence (<code><link
-                
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html";>CompareOp.EQUAL</link>
-            </code>), inequality (<code>CompareOp.NOT_EQUAL</code>), or ranges 
(e.g.,
-              <code>CompareOp.GREATER</code>). The following is example of 
testing equivalence a
-            column to a String value "my value"...</para>
-          <programlisting language="java">
-SingleColumnValueFilter filter = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       Bytes.toBytes("my value")
-       );
-scan.setFilter(filter);
-</programlisting>
-        </section>
-      </section>
-      <section
-        xml:id="client.filter.cvp">
-        <title>Column Value Comparators</title>
-        <para>There are several Comparator classes in the Filter package that 
deserve special
-          mention. These Comparators are used in concert with other Filters, 
such as <xref
-            linkend="client.filter.cv.scvf" />. </para>
-        <section
-          xml:id="client.filter.cvp.rcs">
-          <title>RegexStringComparator</title>
-          <para><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html";>RegexStringComparator</link>
-            supports regular expressions for value comparisons.</para>
-          <programlisting language="java">
-RegexStringComparator comp = new RegexStringComparator("my.");   // any value 
that starts with 'my'
-SingleColumnValueFilter filter = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       comp
-       );
-scan.setFilter(filter);
-</programlisting>
-          <para>See the Oracle JavaDoc for <link
-              
xlink:href="http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html";>supported
-              RegEx patterns in Java</link>. </para>
-        </section>
-        <section
-          xml:id="client.filter.cvp.SubStringComparator">
-          <title>SubstringComparator</title>
-          <para><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html";>SubstringComparator</link>
-            can be used to determine if a given substring exists in a value. 
The comparison is
-            case-insensitive. </para>
-          <programlisting language="java">
-SubstringComparator comp = new SubstringComparator("y val");   // looking for 
'my value'
-SingleColumnValueFilter filter = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       comp
-       );
-scan.setFilter(filter);
-</programlisting>
-        </section>
-        <section
-          xml:id="client.filter.cvp.bfp">
-          <title>BinaryPrefixComparator</title>
-          <para>See <link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html";>BinaryPrefixComparator</link>.</para>
-        </section>
-        <section
-          xml:id="client.filter.cvp.bc">
-          <title>BinaryComparator</title>
-          <para>See <link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html";>BinaryComparator</link>.</para>
-        </section>
-      </section>
-      <section
-        xml:id="client.filter.kvm">
-        <title>KeyValue Metadata</title>
-        <para>As HBase stores data internally as KeyValue pairs, KeyValue 
Metadata Filters evaluate
-          the existence of keys (i.e., ColumnFamily:Column qualifiers) for a 
row, as opposed to
-          values the previous section. </para>
-        <section
-          xml:id="client.filter.kvm.ff">
-          <title>FamilyFilter</title>
-          <para><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html";>FamilyFilter</link>
-            can be used to filter on the ColumnFamily. It is generally a 
better idea to select
-            ColumnFamilies in the Scan than to do it with a Filter.</para>
-        </section>
-        <section
-          xml:id="client.filter.kvm.qf">
-          <title>QualifierFilter</title>
-          <para><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html";>QualifierFilter</link>
-            can be used to filter based on Column (aka Qualifier) name. </para>
-        </section>
-        <section
-          xml:id="client.filter.kvm.cpf">
-          <title>ColumnPrefixFilter</title>
-          <para><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html";>ColumnPrefixFilter</link>
-            can be used to filter based on the lead portion of Column (aka 
Qualifier) names. </para>
-          <para>A ColumnPrefixFilter seeks ahead to the first column matching 
the prefix in each row
-            and for each involved column family. It can be used to efficiently 
get a subset of the
-            columns in very wide rows. </para>
-          <para>Note: The same column qualifier can be used in different 
column families. This
-            filter returns all matching columns. </para>
-          <para>Example: Find all columns in a row and family that start with 
"abc"</para>
-          <programlisting language="java">
-HTableInterface t = ...;
-byte[] row = ...;
-byte[] family = ...;
-byte[] prefix = Bytes.toBytes("abc");
-Scan scan = new Scan(row, row); // (optional) limit to one row
-scan.addFamily(family); // (optional) limit to one family
-Filter f = new ColumnPrefixFilter(prefix);
-scan.setFilter(f);
-scan.setBatch(10); // set this if there could be many columns returned
-ResultScanner rs = t.getScanner(scan);
-for (Result r = rs.next(); r != null; r = rs.next()) {
-  for (KeyValue kv : r.raw()) {
-    // each kv represents a column
-  }
-}
-rs.close();
-</programlisting>
-        </section>
-        <section
-          xml:id="client.filter.kvm.mcpf">
-          <title>MultipleColumnPrefixFilter</title>
-          <para><link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html";>MultipleColumnPrefixFilter</link>
-            behaves like ColumnPrefixFilter but allows specifying multiple 
prefixes. </para>
-          <para>Like ColumnPrefixFilter, MultipleColumnPrefixFilter 
efficiently seeks ahead to the
-            first column matching the lowest prefix and also seeks past ranges 
of columns between
-            prefixes. It can be used to efficiently get discontinuous sets of 
columns from very wide
-            rows. </para>
-          <para>Example: Find all columns in a row and family that start with 
"abc" or "xyz"</para>
-          <programlisting language="java">
-HTableInterface t = ...;
-byte[] row = ...;
-byte[] family = ...;
-byte[][] prefixes = new byte[][] {Bytes.toBytes("abc"), Bytes.toBytes("xyz")};
-Scan scan = new Scan(row, row); // (optional) limit to one row
-scan.addFamily(family); // (optional) limit to one family
-Filter f = new MultipleColumnPrefixFilter(prefixes);
-scan.setFilter(f);
-scan.setBatch(10); // set this if there could be many columns returned
-ResultScanner rs = t.getScanner(scan);
-for (Result r = rs.next(); r != null; r = rs.next()) {
-  for (KeyValue kv : r.raw()) {
-    // each kv represents a column
-  }
-}
-rs.close();
-</programlisting>
-        </section>
-        <section
-          xml:id="client.filter.kvm.crf ">
-          <title>ColumnRangeFilter</title>
-          <para>A <link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html";>ColumnRangeFilter</link>
-            allows efficient intra row scanning. </para>
-          <para>A ColumnRangeFilter can seek ahead to the first matching 
column for each involved
-            column family. It can be used to efficiently get a 'slice' of the 
columns of a very wide
-            row. i.e. you have a million columns in a row but you only want to 
look at columns
-            bbbb-bbdd. </para>
-          <para>Note: The same column qualifier can be used in different 
column families. This
-            filter returns all matching columns. </para>
-          <para>Example: Find all columns in a row and family between "bbbb" 
(inclusive) and "bbdd"
-            (inclusive)</para>
-          <programlisting language="java">
-HTableInterface t = ...;
-byte[] row = ...;
-byte[] family = ...;
-byte[] startColumn = Bytes.toBytes("bbbb");
-byte[] endColumn = Bytes.toBytes("bbdd");
-Scan scan = new Scan(row, row); // (optional) limit to one row
-scan.addFamily(family); // (optional) limit to one family
-Filter f = new ColumnRangeFilter(startColumn, true, endColumn, true);
-scan.setFilter(f);
-scan.setBatch(10); // set this if there could be many columns returned
-ResultScanner rs = t.getScanner(scan);
-for (Result r = rs.next(); r != null; r = rs.next()) {
-  for (KeyValue kv : r.raw()) {
-    // each kv represents a column
-  }
-}
-rs.close();
-</programlisting>
-            <para>Note:  Introduced in HBase 0.92</para>
-        </section>
-      </section>
-      <section xml:id="client.filter.row"><title>RowKey</title>
-        <section xml:id="client.filter.row.rf"><title>RowFilter</title>
-          <para>It is generally a better idea to use the startRow/stopRow 
methods on Scan for row selection, however
-          <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html";>RowFilter</link>
 can also be used.</para>
-        </section>
-      </section>
-      <section xml:id="client.filter.utility"><title>Utility</title>
-        <section 
xml:id="client.filter.utility.fkof"><title>FirstKeyOnlyFilter</title>
-          <para>This is primarily used for rowcount jobs.
-          See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html";>FirstKeyOnlyFilter</link>.</para>
-        </section>
-      </section>
-       </section>  <!--  client.filter -->
-
-    <section xml:id="master"><title>Master</title>
-      <para><code>HMaster</code> is the implementation of the Master Server. 
The Master server is
-        responsible for monitoring all RegionServer instances in the cluster, 
and is the interface
-        for all metadata changes. In a distributed cluster, the Master 
typically runs on the <xref
-          linkend="arch.hdfs.nn"/>. J Mohamed Zahoor goes into some more 
detail on the Master
-        Architecture in this blog posting, <link
-          
xlink:href="http://blog.zahoor.in/2012/08/hbase-hmaster-architecture/";>HBase 
HMaster
-          Architecture </link>.</para>
-       <section xml:id="master.startup"><title>Startup Behavior</title>
-         <para>If run in a multi-Master environment, all Masters compete to 
run the cluster.  If the active
-         Master loses its lease in ZooKeeper (or the Master shuts down), then 
then the remaining Masters jostle to
-         take over the Master role.
-         </para>
-       </section>
-      <section
-        xml:id="master.runtime">
-        <title>Runtime Impact</title>
-        <para>A common dist-list question involves what happens to an HBase 
cluster when the Master
-          goes down. Because the HBase client talks directly to the 
RegionServers, the cluster can
-          still function in a "steady state." Additionally, per <xref
-            linkend="arch.catalog" />, <code>hbase:meta</code> exists as an 
HBase table and is not
-          resident in the Master. However, the Master controls critical 
functions such as
-          RegionServer failover and completing region splits. So while the 
cluster can still run for
-          a short time without the Master, the Master should be restarted as 
soon as possible.
-        </para>
-      </section>
-       <section xml:id="master.api"><title>Interface</title>
-         <para>The methods exposed by <code>HMasterInterface</code> are 
primarily metadata-oriented methods:
-         <itemizedlist>
-            <listitem><para>Table (createTable, modifyTable, removeTable, 
enable, disable)
-            </para></listitem>
-            <listitem><para>ColumnFamily (addColumn, modifyColumn, 
removeColumn)
-            </para></listitem>
-            <listitem><para>Region (move, assign, unassign)
-            </para></listitem>
-         </itemizedlist>
-         For example, when the <code>HBaseAdmin</code> method 
<code>disableTable</code> is invoked, it is serviced by the Master server.
-         </para>
-       </section>
-       <section xml:id="master.processes"><title>Processes</title>
-         <para>The Master runs several background threads:
-         </para>
-         <section 
xml:id="master.processes.loadbalancer"><title>LoadBalancer</title>
-           <para>Periodically, and when there are no regions in transition,
-             a load balancer will run and move regions around to balance the 
cluster's load.
-             See <xref linkend="balancer_config" /> for configuring this 
property.</para>
-             <para>See <xref linkend="regions.arch.assignment"/> for more 
information on region assignment.
-             </para>
-         </section>
-         <section 
xml:id="master.processes.catalog"><title>CatalogJanitor</title>
-           <para>Periodically checks and cleans up the hbase:meta table.  See 
<xref linkend="arch.catalog.meta" /> for more information on META.</para>
-         </section>
-       </section>
-
-     </section>
-    <section
-      xml:id="regionserver.arch">
-      <title>RegionServer</title>
-      <para><code>HRegionServer</code> is the RegionServer implementation. It 
is responsible for
-        serving and managing regions. In a distributed cluster, a RegionServer 
runs on a <xref
-          linkend="arch.hdfs.dn" />. </para>
-      <section
-        xml:id="regionserver.arch.api">
-        <title>Interface</title>
-        <para>The methods exposed by <code>HRegionRegionInterface</code> 
contain both data-oriented
-          and region-maintenance methods: <itemizedlist>
-            <listitem>
-              <para>Data (get, put, delete, next, etc.)</para>
-            </listitem>
-            <listitem>
-              <para>Region (splitRegion, compactRegion, etc.)</para>
-            </listitem>
-          </itemizedlist> For example, when the <code>HBaseAdmin</code> method
-            <code>majorCompact</code> is invoked on a table, the client is 
actually iterating
-          through all regions for the specified table and requesting a major 
compaction directly to
-          each region. </para>
-      </section>
-      <section
-        xml:id="regionserver.arch.processes">
-        <title>Processes</title>
-        <para>The RegionServer runs a variety of background threads:</para>
-        <section
-          xml:id="regionserver.arch.processes.compactsplit">
-          <title>CompactSplitThread</title>
-          <para>Checks for splits and handle minor compactions.</para>
-        </section>
-        <section
-          xml:id="regionserver.arch.processes.majorcompact">
-          <title>MajorCompactionChecker</title>
-          <para>Checks for major compactions.</para>
-        </section>
-        <section
-          xml:id="regionserver.arch.processes.memstore">
-          <title>MemStoreFlusher</title>
-          <para>Periodically flushes in-memory writes in the MemStore to 
StoreFiles.</para>
-        </section>
-        <section
-          xml:id="regionserver.arch.processes.log">
-          <title>LogRoller</title>
-          <para>Periodically checks the RegionServer's WAL.</para>
-        </section>
-      </section>
-
-      <section
-        xml:id="coprocessors">
-        <title>Coprocessors</title>
-        <para>Coprocessors were added in 0.92. There is a thorough <link
-            
xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction";>Blog 
Overview
-            of CoProcessors</link> posted. Documentation will eventually move 
to this reference
-          guide, but the blog is the most current information available at 
this time. </para>
-      </section>
-
-      <section
-        xml:id="block.cache">
-        <title>Block Cache</title>
-
-        <para>HBase provides two different BlockCache implementations: the 
default onheap
-          LruBlockCache and BucketCache, which is (usually) offheap. This 
section
-          discusses benefits and drawbacks of each implementation, how to 
choose the appropriate
-          option, and configuration options for each.</para>
-
-      <note><title>Block Cache Reporting: UI</title>
-      <para>See the RegionServer UI for detail on caching deploy.  Since 
HBase-0.98.4, the
-          Block Cache detail has been significantly extended showing 
configurations,
-          sizings, current usage, time-in-the-cache, and even detail on block 
counts and types.</para>
-  </note>
-
-        <section>
-
-          <title>Cache Choices</title>
-          <para><classname>LruBlockCache</classname> is the original 
implementation, and is
-              entirely within the Java heap. 
<classname>BucketCache</classname> is mainly
-              intended for keeping blockcache data offheap, although 
BucketCache can also
-              keep data onheap and serve from a file-backed cache.
-              <note><title>BucketCache is production ready as of 
hbase-0.98.6</title>
-                <para>To run with BucketCache, you need HBASE-11678. This was 
included in
-                  hbase-0.98.6.
-                </para>
-              </note>
-          </para>
-
-          <para>Fetching will always be slower when fetching from BucketCache,
-              as compared to the native onheap LruBlockCache. However, 
latencies tend to be
-              less erratic across time, because there is less garbage 
collection when you use
-              BucketCache since it is managing BlockCache allocations, not the 
GC. If the
-              BucketCache is deployed in offheap mode, this memory is not 
managed by the
-              GC at all. This is why you'd use BucketCache, so your latencies 
are less erratic and to mitigate GCs
-              and heap fragmentation.  See Nick Dimiduk's <link
-              xlink:href="http://www.n10k.com/blog/blockcache-101/";>BlockCache 
101</link> for
-            comparisons running onheap vs offheap tests. Also see
-            <link xlink:href="http://people.apache.org/~stack/bc/";>Comparing 
BlockCache Deploys</link>
-            which finds that if your dataset fits inside your LruBlockCache 
deploy, use it otherwise
-            if you are experiencing cache churn (or you want your cache to 
exist beyond the
-            vagaries of java GC), use BucketCache.
-              </para>
-
-              <para>When you enable BucketCache, you are enabling a two tier 
caching
-              system, an L1 cache which is implemented by an instance of 
LruBlockCache and
-              an offheap L2 cache which is implemented by BucketCache.  
Management of these
-              two tiers and the policy that dictates how blocks move between 
them is done by
-              <classname>CombinedBlockCache</classname>. It keeps all DATA 
blocks in the L2
-              BucketCache and meta blocks -- INDEX and BLOOM blocks --
-              onheap in the L1 <classname>LruBlockCache</classname>.
-              See <xref linkend="offheap.blockcache" /> for more detail on 
going offheap.</para>
-        </section>
-
-        <section xml:id="cache.configurations">
-            <title>General Cache Configurations</title>
-          <para>Apart from the cache implementation itself, you can set some 
general configuration
-            options to control how the cache performs. See <link
-              
xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html";
-            />. After setting any of these options, restart or rolling restart 
your cluster for the
-            configuration to take effect. Check logs for errors or unexpected 
behavior.</para>
-          <para>See also <xref linkend="blockcache.prefetch"/>, which 
discusses a new option
-            introduced in <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-9857";
-              >HBASE-9857</link>.</para>
-      </section>
-
-        <section
-          xml:id="block.cache.design">
-          <title>LruBlockCache Design</title>
-          <para>The LruBlockCache is an LRU cache that contains three levels 
of block priority to
-            allow for scan-resistance and in-memory ColumnFamilies: </para>
-          <itemizedlist>
-            <listitem>
-              <para>Single access priority: The first time a block is loaded 
from HDFS it normally
-                has this priority and it will be part of the first group to be 
considered during
-                evictions. The advantage is that scanned blocks are more 
likely to get evicted than
-                blocks that are getting more usage.</para>
-            </listitem>
-            <listitem>
-              <para>Mutli access priority: If a block in the previous priority 
group is accessed
-                again, it upgrades to this priority. It is thus part of the 
second group considered
-                during evictions.</para>
-            </listitem>
-            <listitem xml:id="hbase.cache.inmemory">
-              <para>In-memory access priority: If the block's family was 
configured to be
-                "in-memory", it will be part of this priority disregarding the 
number of times it
-                was accessed. Catalog tables are configured like this. This 
group is the last one
-                considered during evictions.</para>
-            <para>To mark a column family as in-memory, call
-                <programlisting 
language="java">HColumnDescriptor.setInMemory(true);</programlisting> if 
creating a table from java,
-                or set <command>IN_MEMORY => true</command> when creating or 
altering a table in
-                the shell: e.g.  <programlisting>hbase(main):003:0> create  
't', {NAME => 'f', IN_MEMORY => 'true'}</programlisting></para>
-            </listitem>
-          </itemizedlist>
-          <para> For more information, see the <link
-              
xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/LruBlockCache.html";>LruBlockCache
-              source</link>
-          </para>
-        </section>
-        <section
-          xml:id="block.cache.usage">
-          <title>LruBlockCache Usage</title>
-          <para>Block caching is enabled by default for all the user tables 
which means that any
-            read operation will load the LRU cache. This might be good for a 
large number of use
-            cases, but further tunings are usually required in order to 
achieve better performance.
-            An important concept is the <link
-              
xlink:href="http://en.wikipedia.org/wiki/Working_set_size";>working set 
size</link>, or
-            WSS, which is: "the amount of memory needed to compute the answer 
to a problem". For a
-            website, this would be the data that's needed to answer the 
queries over a short amount
-            of time. </para>
-          <para>The way to calculate how much memory is available in HBase for 
caching is: </para>
-          <programlisting>
-            number of region servers * heap size * hfile.block.cache.size * 
0.99
-        </programlisting>
-          <para>The default value for the block cache is 0.25 which represents 
25% of the available
-            heap. The last value (99%) is the default acceptable loading 
factor in the LRU cache
-            after which eviction is started. The reason it is included in this 
equation is that it
-            would be unrealistic to say that it is possible to use 100% of the 
available memory
-            since this would make the process blocking from the point where it 
loads new blocks.
-            Here are some examples: </para>
-          <itemizedlist>
-            <listitem>
-              <para>One region server with the default heap size (1 GB) and 
the default block cache
-                size will have 253 MB of block cache available.</para>
-            </listitem>
-            <listitem>
-              <para>20 region servers with the heap size set to 8 GB and a 
default block cache size
-                will have 39.6 of block cache.</para>
-            </listitem>
-            <listitem>
-              <para>100 region servers with the heap size set to 24 GB and a 
block cache size of 0.5
-                will have about 1.16 TB of block cache.</para>
-            </listitem>
-        </itemizedlist>
-        <para>Your data is not the only resident of the block cache. Here are 
others that you may have to take into account:
-        </para>
-          <variablelist>
-            <varlistentry>
-              <term>Catalog Tables</term>
-              <listitem>
-                <para>The <code>-ROOT-</code> (prior to HBase 0.96. See <xref
-                    linkend="arch.catalog.root" />) and 
<code>hbase:meta</code> tables are forced
-                  into the block cache and have the in-memory priority which 
means that they are
-                  harder to evict. The former never uses more than a few 
hundreds of bytes while the
-                  latter can occupy a few MBs (depending on the number of 
regions).</para>
-              </listitem>
-            </varlistentry>
-            <varlistentry>
-              <term>HFiles Indexes</term>
-              <listitem>
-                <para>An <firstterm>hfile</firstterm> is the file format that 
HBase uses to store
-                  data in HDFS. It contains a multi-layered index which allows 
HBase to seek to the
-                  data without having to read the whole file. The size of 
those indexes is a factor
-                  of the block size (64KB by default), the size of your keys 
and the amount of data
-                  you are storing. For big data sets it's not unusual to see 
numbers around 1GB per
-                  region server, although not all of it will be in cache 
because the LRU will evict
-                  indexes that aren't used.</para>
-              </listitem>
-            </varlistentry>
-            <varlistentry>
-              <term>Keys</term>
-              <listitem>
-                <para>The values that are stored are only half the picture, 
since each value is
-                  stored along with its keys (row key, family qualifier, and 
timestamp). See <xref
-                    linkend="keysize" />.</para>
-              </listitem>
-            </varlistentry>
-            <varlistentry>
-              <term>Bloom Filters</term>
-              <listitem>
-                <para>Just like the HFile indexes, those data structures (when 
enabled) are stored
-                  in the LRU.</para>
-              </listitem>
-            </varlistentry>
-          </variablelist>
-          <para>Currently the recommended way to measure HFile indexes and 
bloom filters sizes is to
-            look at the region server web UI and checkout the relevant 
metrics. For keys, sampling
-            can be done by using the HFile command line tool and look for the 
average key size
-            metric. Since HBase 0.98.3, you can view detail on BlockCache 
stats and metrics
-            in a special Block Cache section in the UI.</para>
-          <para>It's generally bad to use block caching when the WSS doesn't 
fit in memory. This is
-            the case when you have for example 40GB available across all your 
region servers' block
-            caches but you need to process 1TB of data. One of the reasons is 
that the churn
-            generated by the evictions will trigger more garbage collections 
unnecessarily. Here are
-            two use cases: </para>
-        <itemizedlist>
-            <listitem>
-              <para>Fully random reading pattern: This is a case where you 
almost never access the
-                same row twice within a short amount of time such that the 
chance of hitting a
-                cached block is close to 0. Setting block caching on such a 
table is a waste of
-                memory and CPU cycles, more so that it will generate more 
garbage to pick up by the
-                JVM. For more information on monitoring GC, see <xref
-                  linkend="trouble.log.gc" />.</para>
-            </listitem>
-            <listitem>
-              <para>Mapping a table: In a typical MapReduce job that takes a 
table in input, every
-                row will be read only once so there's no need to put them into 
the block cache. The
-                Scan object has the option of turning this off via the 
setCaching method (set it to
-                false). You can still keep block caching turned on on this 
table if you need fast
-                random read access. An example would be counting the number of 
rows in a table that
-                serves live traffic, caching every block of that table would 
create massive churn
-                and would surely evict data that's currently in use. </para>
-            </listitem>
-          </itemizedlist>
-          <section xml:id="data.blocks.in.fscache">
-            <title>Caching META blocks only (DATA blocks in fscache)</title>
-            <para>An interesting setup is one where we cache META blocks only 
and we read DATA
-              blocks in on each access. If the DATA blocks fit inside fscache, 
this alternative
-              may make sense when access is completely random across a very 
large dataset.
-              To enable this setup, alter your table and for each column family
-              set <varname>BLOCKCACHE => 'false'</varname>.  You are 
'disabling' the
-              BlockCache for this column family only you can never disable the 
caching of
-              META blocks. Since
-              <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-4683";>HBASE-4683 Always 
cache index and bloom blocks</link>,
-              we will cache META blocks even if the BlockCache is disabled.
-            </para>
-          </section>
-        </section>
-        <section
-          xml:id="offheap.blockcache">
-          <title>Offheap Block Cache</title>
-          <section xml:id="enable.bucketcache">
-            <title>How to Enable BucketCache</title>
-                <para>The usual deploy of BucketCache is via a managing class 
that sets up two caching tiers: an L1 onheap cache
-                    implemented by LruBlockCache and a second L2 cache 
implemented with BucketCache. The managing class is <link
-                        
xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html";>CombinedBlockCache</link>
 by default.
-            The just-previous link describes the caching 'policy' implemented 
by CombinedBlockCache. In short, it works
-            by keeping meta blocks -- INDEX and BLOOM in the L1, onheap 
LruBlockCache tier -- and DATA
-            blocks are kept in the L2, BucketCache tier. It is possible to 
amend this behavior in
-            HBase since version 1.0 and ask that a column family have both its 
meta and DATA blocks hosted onheap in the L1 tier by
-            setting <varname>cacheDataInL1</varname> via
-                  <code>(HColumnDescriptor.setCacheDataInL1(true)</code>
-            or in the shell, creating or amending column families setting 
<varname>CACHE_DATA_IN_L1</varname>
-            to true: e.g. <programlisting>hbase(main):003:0> create 't', {NAME 
=> 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}</programlisting></para>
-
-        <para>The BucketCache Block Cache can be deployed onheap, offheap, or 
file based.
-            You set which via the
-            <varname>hbase.bucketcache.ioengine</varname> setting.  Setting it 
to
-            <varname>heap</varname> will have BucketCache deployed inside the 
-            allocated java heap. Setting it to <varname>offheap</varname> will 
have
-            BucketCache make its allocations offheap,
-            and an ioengine setting of <varname>file:PATH_TO_FILE</varname> 
will direct
-            BucketCache to use a file caching (Useful in particular if you 
have some fast i/o attached to the box such
-            as SSDs).
-        </para>
-        <para xml:id="raw.l1.l2">It is possible to deploy an L1+L2 setup where 
we bypass the CombinedBlockCache
-            policy and have BucketCache working as a strict L2 cache to the L1
-              LruBlockCache. For such a setup, set 
<varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
-              <literal>false</literal>. In this mode, on eviction from L1, 
blocks go to L2.
-              When a block is cached, it is cached first in L1. When we go to 
look for a cached block,
-              we look first in L1 and if none found, then search L2.  Let us 
call this deploy format,
-              <emphasis><indexterm><primary>Raw 
L1+L2</primary></indexterm></emphasis>.</para>
-          <para>Other BucketCache configs include: specifying a location to 
persist cache to across
-              restarts, how many threads to use writing the cache, etc.  See 
the
-              <link 
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html";>CacheConfig.html</link>
-              class for configuration options and descriptions.</para>
-
-            <procedure>
-              <title>BucketCache Example Configuration</title>
-              <para>This sample provides a configuration for a 4 GB offheap 
BucketCache with a 1 GB
-                  onheap cache. Configuration is performed on the 
RegionServer.  Setting
-                  <varname>hbase.bucketcache.ioengine</varname> and 
-                  <varname>hbase.bucketcache.size</varname> &gt; 0 enables 
CombinedBlockCache.
-                  Let us presume that the RegionServer has been set to run 
with a 5G heap:
-                  i.e. HBASE_HEAPSIZE=5g.
-              </para>
-              <step>
-                <para>First, edit the RegionServer's 
<filename>hbase-env.sh</filename> and set
-                  <varname>HBASE_OFFHEAPSIZE</varname> to a value greater than 
the offheap size wanted, in
-                  this case, 4 GB (expressed as 4G).  Lets set it to 5G.  
That'll be 4G
-                  for our offheap cache and 1G for any other uses of offheap 
memory (there are
-                  other users of offheap memory other than BlockCache; e.g. 
DFSClient 
-                  in RegionServer can make use of offheap memory). See <xref 
linkend="direct.memory" />.</para>
-                <programlisting>HBASE_OFFHEAPSIZE=5G</programlisting>
-              </step>
-              <step>
-                <para>Next, add the following configuration to the 
RegionServer's
-                    <filename>hbase-site.xml</filename>.</para>
-                <programlisting language="xml">
-<![CDATA[<property>
-  <name>hbase.bucketcache.ioengine</name>
-  <value>offheap</value>
-</property>
-<property>
-  <name>hfile.block.cache.size</name>
-  <value>0.2</value>
-</property>
-<property>
-  <name>hbase.bucketcache.size</name>
-  <value>4196</value>
-</property>]]>
-          </programlisting>
-              </step>
-              <step>
-                <para>Restart or rolling restart your cluster, and check the 
logs for any
-                  issues.</para>
-              </step>
-            </procedure>
-            <para>In the above, we set bucketcache to be 4G.  The onheap 
lrublockcache we
-                configured to have 0.2 of the RegionServer's heap size (0.2 * 
5G = 1G).
-                In other words, you configure the L1 LruBlockCache as you 
would normally,
-                as you would when there is no L2 BucketCache present.
-            </para>
-            <para><link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-10641";
-                >HBASE-10641</link> introduced the ability to configure 
multiple sizes for the
-              buckets of the bucketcache, in HBase 0.98 and newer. To 
configurable multiple bucket
-              sizes, configure the new property 
<option>hfile.block.cache.sizes</option> (instead of
-                <option>hfile.block.cache.size</option>) to a comma-separated 
list of block sizes,
-              ordered from smallest to largest, with no spaces. The goal is to 
optimize the bucket
-              sizes based on your data access patterns. The following example 
configures buckets of
-              size 4096 and 8192.</para>
-            <screen language="xml"><![CDATA[
-<property>
-  <name>hfile.block.cache.sizes</name>
-  <value>4096,8192</value>
-</property>
-              ]]></screen>
-            <note xml:id="direct.memory">
-                <title>Direct Memory Usage In HBase</title>
-                <para>The default maximum direct memory varies by JVM.  
Traditionally it is 64M
-                    or some relation to allocated heap size (-Xmx) or no limit 
at all (JDK7 apparently).
-                    HBase servers use direct memory, in particular 
short-circuit reading, the hosted DFSClient will
-                    allocate direct memory buffers.  If you do offheap block 
caching, you'll
-                    be making use of direct memory.  Starting your JVM, make 
sure
-                    the <varname>-XX:MaxDirectMemorySize</varname> setting in
-                    <filename>conf/hbase-env.sh</filename> is set to some 
value that is
-                    higher than what you have allocated to your offheap 
blockcache
-                    (<varname>hbase.bucketcache.size</varname>).  It should be 
larger than your offheap block
-                    cache and then some for DFSClient usage (How much the 
DFSClient uses is not
-                    easy to quantify; it is the number of open hfiles * 
<varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
-                    where hbase.dfs.client.read.shortcircuit.buffer.size is 
set to 128k in HBase -- see <filename>hbase-default.xml</filename>
-                    default configurations).
-                        Direct memory, which is part of the Java process heap, 
is separate from the object
-                        heap allocated by -Xmx. The value allocated by 
MaxDirectMemorySize must not exceed
-                        physical RAM, and is likely to be less than the total 
available RAM due to other
-                        memory requirements and system constraints.
-                </para>
-              <para>You can see how much memory -- onheap and offheap/direct 
-- a RegionServer is
-                configured to use and how much it is using at any one time by 
looking at the
-                  <emphasis>Server Metrics: Memory</emphasis> tab in the UI. 
It can also be gotten
-                via JMX. In particular the direct memory currently used by the 
server can be found
-                on the <varname>java.nio.type=BufferPool,name=direct</varname> 
bean. Terracotta has
-                a <link
-                  
xlink:href="http://terracotta.org/documentation/4.0/bigmemorygo/configuration/storage-options";
-                  >good write up</link> on using offheap memory in java. It is 
for their product
-                BigMemory but alot of the issues noted apply in general to any 
attempt at going
-                offheap. Check it out.</para>
-            </note>
-              <note 
xml:id="hbase.bucketcache.percentage.in.combinedcache"><title>hbase.bucketcache.percentage.in.combinedcache</title>
-                  <para>This is a pre-HBase 1.0 configuration removed because 
it
-                      was confusing. It was a float that you would set to some 
value
-                      between 0.0 and 1.0.  Its default was 0.9. If the deploy 
was using
-                      CombinedBlockCache, then the LruBlockCache L1 size was 
calculated to
-                      be (1 - 
<varname>hbase.bucketcache.percentage.in.combinedcache</varname>) * 
<varname>size-of-bucketcache</varname> 
-                      and the BucketCache size was 
<varname>hbase.bucketcache.percentage.in.combinedcache</varname> * 
size-of-bucket-cache.
-                      where size-of-bucket-cache itself is EITHER the value of 
the configuration hbase.bucketcache.size
-                      IF it was specified as megabytes OR 
<varname>hbase.bucketcache.size</varname> * 
<varname>-XX:MaxDirectMemorySize</varname> if
-                      <varname>hbase.bucketcache.size</varname> between 0 and 
1.0.
-                  </para>
-                  <para>In 1.0, it should be more straight-forward. L1 
LruBlockCache size
-                      is set as a fraction of java heap using 
hfile.block.cache.size setting
-                      (not the best name) and L2 is set as above either in 
absolute
-                      megabytes or as a fraction of allocated maximum direct 
memory.
-                  </para>
-              </note>
-          </section>
-        </section>
-        <section>
-          <title>Comprewssed Blockcache</title>
-          <para><link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-11331";
-              >HBASE-11331</link> introduced lazy blockcache decompression, 
more simply referred to
-            as compressed blockcache. When compressed blockcache is enabled. 
data and encoded data
-            blocks are cached in the blockcache in their on-disk format, 
rather than being
-            decompressed and decrypted before caching.</para>
-          <para 
xlink:href="https://issues.apache.org/jira/browse/HBASE-11331";>For a 
RegionServer
-            hosting more data than can fit into cache, enabling this feature 
with SNAPPY compression
-            has been shown to result in 50% increase in throughput and 30% 
improvement in mean
-            latency while, increasing garbage collection by 80% and increasing 
overall CPU load by
-            2%. See HBASE-11331 for more details about how performance was 
measured and achieved.
-            For a RegionServer hosting data that can comfortably fit into 
cache, or if your workload
-            is sensitive to extra CPU or garbage-collection load, you may 
receive less
-            benefit.</para>
-          <para>Compressed blockcache is disabled by default. To enable it, set
-              <code>hbase.block.data.cachecompressed</code> to 
<code>true</code> in
-              <filename>hbase-site.xml</filename> on all RegionServers.</para>
-        </section>
-      </section>
-
-      <section
-        xml:id="wal">
-        <title>Write Ahead Log (WAL)</title>
-
-        <section
-          xml:id="purpose.wal">
-          <title>Purpose</title>
-          <para>The <firstterm>Write Ahead Log (WAL)</firstterm> records all 
changes to data in
-            HBase, to file-based storage. Under normal operations, the WAL is 
not needed because
-            data changes move from the MemStore to StoreFiles. However, if a 
RegionServer crashes or
-          becomes unavailable before the MemStore is flushed, the WAL ensures 
that the changes to
-          the data can be replayed. If writing to the WAL fails, the entire 
operation to modify the
-          data fails.</para>
-          <para>
-            HBase uses an implementation of the <link xlink:href=
-            
"http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/wal/WAL.html";
-            >WAL</link> interface. Usually, there is only one instance of a 
WAL per RegionServer.
-            The RegionServer records Puts and Deletes to it, before recording 
them to the <xref
-              linkend="store.memstore" /> for the affected <xref
-              linkend="store" />.
-          </para>
-          <note>
-            <title>The HLog</title>
-            <para>
-              Prior to 2.0, the interface for WALs in HBase was named 
<classname>HLog</classname>.
-              In 0.94, HLog was the name of the implementation of the WAL. You 
will likely find
-              references to the HLog in documentation tailored to these older 
versions.
-            </para>
-          </note>
-          <para>The WAL resides in HDFS in the 
<filename>/hbase/WALs/</filename> directory (prior to
-            HBase 0.94, they were stored in 
<filename>/hbase/.logs/</filename>), with subdirectories per
-            region.</para>
-          <para> For more general information about the concept of write ahead 
logs, see the
-            Wikipedia <link
-              
xlink:href="http://en.wikipedia.org/wiki/Write-ahead_logging";>Write-Ahead 
Log</link>
-            article. </para>
-        </section>
-        <section
-          xml:id="wal_flush">
-          <title>WAL Flushing</title>
-          <para>TODO (describe). </para>
-        </section>
-
-        <section
-          xml:id="wal_splitting">
-          <title>WAL Splitting</title>
-
-          <para>A RegionServer serves many regions. All of the regions in a 
region server share the
-            same active WAL file. Each edit in the WAL file includes 
information about which region
-            it belongs to. When a region is opened, the edits in the WAL file 
which belong to that
-            region need to be replayed. Therefore, edits in the WAL file must 
be grouped by region
-            so that particular sets can be replayed to regenerate the data in 
a particular region.
-            The process of grouping the WAL edits by region is called 
<firstterm>log
-              splitting</firstterm>. It is a critical process for recovering 
data if a region server
-            fails.</para>
-          <para>Log splitting is done by the HMaster during cluster start-up 
or by the ServerShutdownHandler
-            as a region server shuts down. So that consistency is guaranteed, 
affected regions
-            are unavailable until data is restored. All WAL edits need to be 
recovered and replayed
-            before a given region can become available again. As a result, 
regions affected by
-            log splitting are unavailable until the process completes.</para>
-          <procedure xml:id="log.splitting.step.by.step">
-            <title>Log Splitting, Step by Step</title>
-            <step>
-              <title>The 
<filename>/hbase/WALs/&lt;host>,&lt;port>,&lt;startcode></filename> directory 
is renamed.</title>
-              <para>Renaming the directory is important because a RegionServer 
may still be up and
-                accepting requests even if the HMaster thinks it is down. If 
the RegionServer does
-                not respond immediately and does not heartbeat its ZooKeeper 
session, the HMaster
-                may interpret this as a RegionServer failure. Renaming the 
logs directory ensures
-                that existing, valid WAL files which are still in use by an 
active but busy
-                RegionServer are not written to by accident.</para>
-              <para>The new directory is named according to the following 
pattern:</para>
-              
<screen><![CDATA[/hbase/WALs/<host>,<port>,<startcode>-splitting]]></screen>
-              <para>An example of such a renamed directory might look like the 
following:</para>
-              
<screen>/hbase/WALs/srv.example.com,60020,1254173957298-splitting</screen>
-            </step>
-            <step>
-              <title>Each log file is split, one at a time.</title>
-              <para>The log splitter reads the log file one edit entry at a 
time and puts each edit
-                entry into the buffer corresponding to the edit’s region. At 
the same time, the
-                splitter starts several writer threads. Writer threads pick up 
a corresponding
-                buffer and write the edit entries in the buffer to a temporary 
recovered edit
-                file. The temporary edit file is stored to disk with the 
following naming pattern:</para>
-              
<screen><![CDATA[/hbase/<table_name>/<region_id>/recovered.edits/.temp]]></screen>
-              <para>This file is used to store all the edits in the WAL log 
for this region. After
-                log splitting completes, the <filename>.temp</filename> file 
is renamed to the
-                sequence ID of the first log written to the file.</para>
-              <para>To determine whether all edits have been written, the 
sequence ID is compared to
-                the sequence of the last edit that was written to the HFile. 
If the sequence of the
-                last edit is greater than or equal to the sequence ID included 
in the file name, it
-                is clear that all writes from the edit file have been 
completed.</para>
-            </step>
-            <step>
-              <title>After log splitting is complete, each affected region is 
assigned to a
-                RegionServer.</title>
-              <para> When the region is opened, the 
<filename>recovered.edits</filename> folder is checked for recovered
-                edits files. If any such files are present, they are replayed 
by reading the edits
-                and saving them to the MemStore. After all edit files are 
replayed, the contents of
-                the MemStore are written to disk (HFile) and the edit files 
are deleted.</para>
-            </step>
-          </procedure>
-  
-          <section>
-            <title>Handling of Errors During Log Splitting</title>
-
-            <para>If you set the 
<varname>hbase.hlog.split.skip.errors</varname> option to
-                <constant>true</constant>, errors are treated as 
follows:</para>
-            <itemizedlist>
-              <listitem>
-                <para>Any error encountered during splitting will be 
logged.</para>
-              </listitem>
-              <listitem>
-                <para>The problematic WAL log will be moved into the 
<filename>.corrupt</filename>
-                  directory under the hbase <varname>rootdir</varname>,</para>
-              </listitem>
-              <listitem>
-                <para>Processing of the WAL will continue</para>
-              </listitem>
-            </itemizedlist>
-            <para>If the <varname>hbase.hlog.split.skip.errors</varname> 
optionset to
-                <literal>false</literal>, the default, the exception will be 
propagated and the
-              split will be logged as failed. See <link
-                    
xlink:href="https://issues.apache.org/jira/browse/HBASE-2958";>HBASE-2958 When
-                    hbase.hlog.split.skip.errors is set to false, we fail the 
split but thats
-                    it</link>. We need to do more than just fail split if this 
flag is set.</para>
-            
-            <section>
-              <title>How EOFExceptions are treated when splitting a crashed 
RegionServers'
-                WALs</title>
-
-              <para>If an EOFException occurs while splitting logs, the split 
proceeds even when
-                  <varname>hbase.hlog.split.skip.errors</varname> is set to
-                <literal>false</literal>. An EOFException while reading the 
last log in the set of
-                files to split is likely, because the RegionServer is likely 
to be in the process of
-                writing a record at the time of a crash. For background, see 
<link
-                      
xlink:href="https://issues.apache.org/jira/browse/HBASE-2643";>HBASE-2643
-                      Figure how to deal with eof splitting logs</link></para>
-            </section>
-          </section>
-          
-          <section>
-            <title>Performance Improvements during Log Splitting</title>
-            <para>
-              WAL log splitting and recovery can be resource intensive and 
take a long time,
-              depending on the number of RegionServers involved in the crash 
and the size of the
-              regions. <xref linkend="distributed.log.splitting" /> and <xref
-                linkend="distributed.log.replay" /> were developed to improve
-              performance during log splitting.
-            </para>
-            <section xml:id="distributed.log.splitting">
-              <title>Distributed Log Splitting</title>
-              <para><firstterm>Distributed Log Splitting</firstterm> was added 
in HBase version 0.92
-                (<link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-1364";>HBASE-1364</link>)
 
-                by Prakash Khemani from Facebook. It reduces the time to 
complete log splitting
-                dramatically, improving the availability of regions and 
tables. For
-                example, recovering a crashed cluster took around 9 hours with 
single-threaded log
-                splitting, but only about six minutes with distributed log 
splitting.</para>
-              <para>The information in this section is sourced from Jimmy 
Xiang's blog post at <link
-              
xlink:href="http://blog.cloudera.com/blog/2012/07/hbase-log-splitting/"; 
/>.</para>
-              
-              <formalpara>
-                <title>Enabling or Disabling Distributed Log Splitting</title>
-                <para>Distributed log processing is enabled by default since 
HBase 0.92. The setting
-                  is controlled by the 
<property>hbase.master.distributed.log.splitting</property>
-                  property, which can be set to <literal>true</literal> or 
<literal>false</literal>,
-                  but defaults to <literal>true</literal>. </para>
-              </formalpara>
-              <procedure>
-                <title>Distributed Log Splitting, Step by Step</title>
-                <para>After configuring distributed log splitting, the HMaster 
controls the process.
-                  The HMaster enrolls each RegionServer in the log splitting 
process, and the actual
-                  work of splitting the logs is done by the RegionServers. The 
general process for
-                  log splitting, as described in <xref
-                    linkend="log.splitting.step.by.step" /> still applies 
here.</para>
-                <step>
-                  <para>If distributed log processing is enabled, the HMaster 
creates a
-                    <firstterm>split log manager</firstterm> instance when the 
cluster is started.
-                    The split log manager manages all log files which need
-                    to be scanned and split. The split log manager places all 
the logs into the
-                    ZooKeeper splitlog node 
(<filename>/hbase/splitlog</filename>) as tasks. You can
-                  view the contents of the splitlog by issuing the following
-                    <command>zkcli</command> command. Example output is 
shown.</para>
-                  <screen language="bourne">ls /hbase/splitlog
-[hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2F.logs%2Fhost8.sample.com%2C57020%2C1340474893275-splitting%2Fhost8.sample.com%253A57020.1340474893900,
 
-hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2F.logs%2Fhost3.sample.com%2C57020%2C1340474893299-splitting%2Fhost3.sample.com%253A57020.1340474893931,
 
-hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2F.logs%2Fhost4.sample.com%2C57020%2C1340474893287-splitting%2Fhost4.sample.com%253A57020.1340474893946]
                  
-                  </screen>
-                  <para>The output contains some non-ASCII characters. When 
decoded, it looks much
-                    more simple:</para>
-                  <screen>
-[hdfs://host2.sample.com:56020/hbase/.logs
-/host8.sample.com,57020,1340474893275-splitting
-/host8.sample.com%3A57020.1340474893900, 
-hdfs://host2.sample.com:56020/hbase/.logs
-/host3.sample.com,57020,1340474893299-splitting
-/host3.sample.com%3A57020.1340474893931, 
-hdfs://host2.sample.com:56020/hbase/.logs
-/host4.sample.com,57020,1340474893287-splitting
-/host4.sample.com%3A57020.1340474893946]                    
-                  </screen>
-                  <para>The listing represents WAL file names to be scanned 
and split, which is a
-                    list of log splitting tasks.</para>
-                </step>
-                <step>
-                  <title>The split log manager monitors the log-splitting 
tasks and workers.</title>
-                  <para>The split log manager is responsible for the following 
ongoing tasks:</para>
-                  <itemizedlist>
-                    <listitem>
-                      <para>Once the split log manager publishes all the tasks 
to the splitlog
-                        znode, it monitors these task nodes and waits for them 
to be
-                        processed.</para>
-                    </listitem>
-                    <listitem>
-                      <para>Checks to see if there are any dead split log
-                        workers queued up. If it finds tasks claimed by 
unresponsive workers, it
-                        will resubmit those tasks. If the resubmit fails due 
to some ZooKeeper
-                        exception, the dead worker is queued up again for 
retry.</para>
-                    </listitem>
-                    <listitem>
-                      <para>Checks to see if there are any unassigned
-                        tasks. If it finds any, it create an ephemeral rescan 
node so that each
-                        split log worker is notified to re-scan unassigned 
tasks via the
-                          <code>nodeChildrenChanged</code> ZooKeeper 
event.</para>
-                    </listitem>
-                    <listitem>
-                      <para>Checks for tasks which are assigned but expired. 
If any are found, they
-                        are moved back to <code>TASK_UNASSIGNED</code> state 
again so that they can
-                        be retried. It is possible that these tasks are 
assigned to slow workers, or
-                        they may already be finished. This is not a problem, 
because log splitting
-                        tasks have the property of idempotence. In other 
words, the same log
-                        splitting task can be processed many times without 
causing any
-                        problem.</para>
-                    </listitem>
-                    <listitem>
-                      <para>The split log manager watches the HBase split log 
znodes constantly. If
-                        any split log task node data is changed, the split log 
manager retrieves the
-                        node data. The
-                        node data contains the current state of the task. You 
can use the
-                        <command>zkcli</command> <command>get</command> 
command to retrieve the
-                        current state of a task. In the example output below, 
the first line of the
-                        output shows that the task is currently 
unassigned.</para>
-                      <screen>
-<userinput>get 
/hbase/splitlog/hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2F.logs%2Fhost6.sample.com%2C57020%2C1340474893287-splitting%2Fhost6.sample.com%253A57020.1340474893945
-</userinput> 
-<computeroutput>unassigned host2.sample.com:57000
-cZxid = 0×7115
-ctime = Sat Jun 23 11:13:40 PDT 2012
-...</computeroutput>  
-                      </screen>
-                      <para>Based on the state of the task whose data is 
changed, the split log
-                        manager does one of the following:</para>
-
-                      <itemizedlist>
-                        <listitem>
-                          <para>Resubmit the task if it is unassigned</para>
-                        </listitem>
-                        <listitem>
-                          <para>Heartbeat the task if it is assigned</para>
-                        </listitem>
-                        <listitem>
-                          <para>Resubmit or fail the task if it is resigned 
(see <xref
-                            linkend="distributed.log.replay.failure.reasons" 
/>)</para>
-                        </listitem>
-                        <listitem>
-                          <para>Resubmit or fail the task if it is completed 
with errors (see <xref
-                            linkend="distributed.log.replay.failure.reasons" 
/>)</para>
-                        </listitem>
-                        <listitem>
-                          <para>Resubmit or fail the task if it could not 
complete due to
-                            errors (see <xref
-                            linkend="distributed.log.replay.failure.reasons" 
/>)</para>
-                        </listitem>
-                        <listitem>
-                          <para>Delete the task if it is successfully 
completed or failed</para>
-                        </listitem>
-                      </itemizedlist>
-                      <itemizedlist 
xml:id="distributed.log.replay.failure.reasons">
-                        <title>Reasons a Task Will Fail</title>
-                        <listitem><para>The task has been 
deleted.</para></listitem>
-                        <listitem><para>The node no longer 
exists.</para></listitem>
-                        <listitem><para>The log status manager failed to move 
the state of the task
-                          to TASK_UNASSIGNED.</para></listitem>
-                        <listitem><para>The number of resubmits is over the 
resubmit
-                          threshold.</para></listitem>
-                      </itemizedlist>
-                    </listitem>
-                  </itemizedlist>
-                </step>
-                <step>
-                  <title>Each RegionServer's split log worker performs the 
log-splitting tasks.</title>
-                  <para>Each RegionServer runs a daemon thread called the 
<firstterm>split log
-                      worker</firstterm>, which does the work to split the 
logs. The daemon thread
-                    starts when the RegionServer starts, and registers itself 
to watch HBase znodes.
-                    If any splitlog znode children change, it notifies a 
sleeping worker thread to
-                    wake up and grab more tasks. If if a worker's current 
task’s node data is
-                    changed, the worker checks to see if the task has been 
taken by another worker.
-                    If so, the worker thread stops work on the current 
task.</para>
-                  <para>The worker monitors
-                    the splitlog znode constantly. When a new task appears, 
the split log worker
-                    retrieves  the task paths and checks each one until it 
finds an unclaimed task,
-                    which it attempts to claim. If the claim was successful, 
it attempts to perform
-                    the task and updates the task's <property>state</property> 
property based on the
-                    splitting outcome. At this point, the split log worker 
scans for another
-                    unclaimed task.</para>
-                  <itemizedlist>
-                    <title>How the Split Log Worker Approaches a Task</title>
-
-                    <listitem>
-                      <para>It queries the task state and only takes action if 
the task is in
-                          <literal>TASK_UNASSIGNED </literal>state.</para>
-                    </listitem>
-                    <listitem>
-                      <para>If the task is is in 
<literal>TASK_UNASSIGNED</literal> state, the
-                        worker attempts to set the state to 
<literal>TASK_OWNED</literal> by itself.
-                        If it fails to set the state, another worker will try 
to grab it. The split
-                        log manager will also ask all workers to rescan later 
if the task remains
-                        unassigned.</para>
-                    </listitem>
-                    <listitem>
-                      <para>If the worker succeeds in taking ownership of the 
task, it tries to get
-                        the task state again to make sure it really gets it 
asynchronously. In the
-                        meantime, it starts a split task executor to do the 
actual work: </para>
-                      <itemizedlist>
-                        <listitem>
-                          <para>Get the HBase root folder, create a temp 
folder under the root, and
-                            split the log file to the temp folder.</para>
-                        </listitem>
-                        <listitem>
-                          <para>If the split was successful, the task executor 
sets the task to
-                            state <literal>TASK_DONE</literal>.</para>
-                        </listitem>
-                        <listitem>
-                          <para>If the worker catches an unexpected 
IOException, the task is set to
-                            state <literal>TASK_ERR</literal>.</para>
-                        </listitem>
-                        <listitem>
-                          <para>If the worker is shutting down, set the the 
task to state
-                              <literal>TASK_RESIGNED</literal>.</para>
-                        </listitem>
-                        <listitem>
-                          <para>If the task is taken by another worker, just 
log it.</para>
-                        </listitem>
-                      </itemizedlist>
-                    </listitem>
-                  </itemizedlist>
-                </step>
-                <step>
-                  <title>The split log manager monitors for uncompleted 
tasks.</title>
-                  <para>The split log manager returns when all tasks are 
completed successfully. If
-                    all tasks are completed with some failures, the split log 
manager throws an
-                    exception so that the log splitting can be retried. Due to 
an asynchronous
-                    implementation, in very rare cases, the split log manager 
loses track of some
-                    completed tasks. For that reason, it periodically checks 
for remaining
-                    uncompleted task in its task map or ZooKeeper. If none are 
found, it throws an
-                    exception so that the log splitting can be retried right 
away instead of hanging
-                    there waiting for something that won’t happen.</para>
-                </step>
-              </procedure>
-            </section>
-            <section xml:id="distributed.log.replay">
-              <title>Distributed Log Replay</title>
-              <para>After a RegionServer fails, its failed region is assigned 
to another
-                RegionServer, which is marked as "recovering" in ZooKeeper. A 
split log worker directly
-                replays edits from the WAL of the failed region server to the 
region at its new
-                location. When a region is in "recovering" state, it can 
accept writes but no reads
-                (including Append and Increment), region splits or merges. 
</para>
-              <para>Distributed Log Replay extends the <xref 
linkend="distributed.log.splitting" /> framework. It works by
-                directly replaying WAL edits to another RegionServer instead 
of creating
-                  <filename>recovered.edits</filename> files. It provides the 
following advantages
-                over distributed log splitting alone:</para>
-              <itemizedlist>
-                <listitem><para>It eliminates the overhead of writing and 
reading a large number of
-                  <filename>recovered.edits</filename> files. It is not 
unusual for thousands of
-                  <filename>recovered.edits</filename> files to be created and 
written concurrently
-                  during a RegionServer recovery. Many small random writes can 
degrade overall
-                  system performance.</para></listitem>
-                <listitem><para>It allows writes even when a region is in 
recovering state. It only takes seconds for a recovering region to accept 
writes again. 
-</para></listitem>
-              </itemizedlist>
-              <formalpara>
-                <title>Enabling Distributed Log Replay</title>
-                <para>To enable distributed log replay, set 
<varname>hbase.master.distributed.log.replay</varname> to
-                  true. This will be the default for HBase 0.99 (<link
-                    
xlink:href="https://issues.apache.org/jira/browse/HBASE-10888";>HBASE-10888</link>).</para>
-              </formalpara>
-              <para>You must also enable HFile version 3 (which is the default 
HFile format starting
-                in HBase 0.99. See <link
-                  
xlink:href="https://issues.apache.org/jira/browse/HBASE-10855";>HBASE-

<TRUNCATED>

Reply via email to