[12/30] hbase git commit: HBASE-12918 Backport asciidoc changes (Enis Soztutar)

apurtell Tue, 27 Jan 2015 19:00:12 -0800

http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/book.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
deleted file mode 100644
index 26085a9..0000000
--- a/src/main/docbkx/book.xml
+++ /dev/null
@@ -1,3595 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-<book version="5.0" xmlns="http://docbook.org/ns/docbook";
-      xmlns:xlink="http://www.w3.org/1999/xlink";
-      xmlns:xi="http://www.w3.org/2001/XInclude";
-      xmlns:svg="http://www.w3.org/2000/svg";
-      xmlns:m="http://www.w3.org/1998/Math/MathML";
-      xmlns:html="http://www.w3.org/1999/xhtml";
-      xmlns:db="http://docbook.org/ns/docbook"; xml:id="book">
-  <info>
-
-    <title><link xlink:href="http://www.hbase.org";>
-    The Apache HBase&#153; Reference Guide
-    </link></title>
-    <subtitle><link xlink:href="http://www.hbase.org";>
-           <inlinemediaobject>
-               <imageobject>
-                   <imagedata align="middle" valign="middle" 
fileref="hbase_logo.png" />
-               </imageobject>
-           </inlinemediaobject>
-       </link>
-    </subtitle>
-    <copyright><year>2013</year><holder>Apache Software Foundation.
-        All Rights Reserved.  Apache Hadoop, Hadoop, MapReduce, HDFS, 
Zookeeper, HBase, and the HBase project logo are trademarks of the Apache 
Software Foundation.
-        </holder>
-    </copyright>
-      <abstract>
-    <para>This is the official reference guide of
-    <link xlink:href="http://www.hbase.org";>Apache HBase&#153;</link>,
-    a distributed, versioned, column-oriented database built on top of
-    <link xlink:href="http://hadoop.apache.org/";>Apache Hadoop&#153;</link> and
-    <link xlink:href="http://zookeeper.apache.org/";>Apache 
ZooKeeper&#153;</link>.
-      </para>
-      </abstract>
-
-    <revhistory>
-      <revision>
-        <revnumber>
-          <?eval ${project.version}?>
-        </revnumber>
-        <date>
-          <?eval ${buildDate}?>
-        </date>
-      </revision>
-    </revhistory>
-  </info>
-
-  <!--XInclude some chapters-->
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"; href="preface.xml" />
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"; 
href="getting_started.xml" />
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"; 
href="configuration.xml" />
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"; href="upgrading.xml" 
/>
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"; href="shell.xml" />
-
-  <chapter xml:id="datamodel">
-    <title>Data Model</title>
-    <para>In short, applications store data into an HBase table.
-        Tables are made of rows and columns.
-      All columns in HBase belong to a particular column family.
-      Table cells -- the intersection of row and column
-      coordinates -- are versioned.
-      A cellâs content is an uninterpreted array of bytes.
-  </para>
-      <para>Table row keys are also byte arrays so almost anything can
-      serve as a row key from strings to binary representations of longs or
-      even serialized data structures. Rows in HBase tables
-      are sorted by row key. The sort is byte-ordered. All table accesses are
-      via the table row key -- its primary key.
-</para>
-
-    <section xml:id="conceptual.view"><title>Conceptual View</title>
-       <para>
-        The following example is a slightly modified form of the one on page
-        2 of the <link 
xlink:href="http://research.google.com/archive/bigtable.html";>BigTable</link> 
paper.
-    There is a table called <varname>webtable</varname> that contains two 
column families named
-    <varname>contents</varname> and <varname>anchor</varname>.
-    In this example, <varname>anchor</varname> contains two
-    columns (<varname>anchor:cssnsi.com</varname>, 
<varname>anchor:my.look.ca</varname>)
-    and <varname>contents</varname> contains one column 
(<varname>contents:html</varname>).
-    <note>
-        <title>Column Names</title>
-      <para>
-      By convention, a column name is made of its column family prefix and a
-      <emphasis>qualifier</emphasis>. For example, the
-      column
-      <emphasis>contents:html</emphasis> is made up of the column family 
<varname>contents</varname>
-      and <varname>html</varname> qualifier.
-          The colon character (<literal
-          moreinfo="none">:</literal>) delimits the column family from the
-          column family <emphasis>qualifier</emphasis>.
-    </para>
-    </note>
-    <table frame='all'><title>Table <varname>webtable</varname></title>
-       <tgroup cols='4' align='left' colsep='1' rowsep='1'>
-       <colspec colname='c1'/>
-       <colspec colname='c2'/>
-       <colspec colname='c3'/>
-       <colspec colname='c4'/>
-       <thead>
-        <row><entry>Row Key</entry><entry>Time 
Stamp</entry><entry>ColumnFamily 
<varname>contents</varname></entry><entry>ColumnFamily 
<varname>anchor</varname></entry></row>
-       </thead>
-       <tbody>
-        
<row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname>
 = "CNN"</entry></row>
-        
<row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname>
 = "CNN.com"</entry></row>
-        
<row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname>
 = "&lt;html&gt;..."</entry><entry></entry></row>
-        
<row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname>
 = "&lt;html&gt;..."</entry><entry></entry></row>
-        
<row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname>
 = "&lt;html&gt;..."</entry><entry></entry></row>
-       </tbody>
-       </tgroup>
-       </table>
-       </para>
-       </section>
-    <section xml:id="physical.view"><title>Physical View</title>
-       <para>
-        Although at a conceptual level tables may be viewed as a sparse set of 
rows.
-        Physically they are stored on a per-column family basis.  New columns
-        (i.e., <varname>columnfamily:column</varname>) can be added to any
-        column family without pre-announcing them.
-        <table frame='all'><title>ColumnFamily 
<varname>anchor</varname></title>
-       <tgroup cols='3' align='left' colsep='1' rowsep='1'>
-       <colspec colname='c1'/>
-       <colspec colname='c2'/>
-       <colspec colname='c3'/>
-       <thead>
-        <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column 
Family <varname>anchor</varname></entry></row>
-       </thead>
-       <tbody>
-        
<row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname>
 = "CNN"</entry></row>
-        
<row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname>
 = "CNN.com"</entry></row>
-       </tbody>
-       </tgroup>
-       </table>
-    <table frame='all'><title>ColumnFamily <varname>contents</varname></title>
-       <tgroup cols='3' align='left' colsep='1' rowsep='1'>
-       <colspec colname='c1'/>
-       <colspec colname='c2'/>
-       <colspec colname='c3'/>
-       <thead>
-       <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily 
"contents:"</entry></row>
-       </thead>
-       <tbody>
-        
<row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname>
 = "&lt;html&gt;..."</entry></row>
-        
<row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname>
 = "&lt;html&gt;..."</entry></row>
-        
<row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname>
 = "&lt;html&gt;..."</entry></row>
-       </tbody>
-       </tgroup>
-       </table>
-    It is important to note in the diagram above that the empty cells shown in 
the
-    conceptual view are not stored since they need not be in a column-oriented
-    storage format. Thus a request for the value of the 
<varname>contents:html</varname>
-    column at time stamp <literal>t8</literal> would return no value. 
Similarly, a
-    request for an <varname>anchor:my.look.ca</varname> value at time stamp
-    <literal>t9</literal> would return no value.  However, if no timestamp is
-    supplied, the most recent value for a particular column would be returned
-    and would also be the first one found since timestamps are stored in
-    descending order. Thus a request for the values of all columns in the row
-    <varname>com.cnn.www</varname> if no timestamp is specified would be:
-    the value of <varname>contents:html</varname> from time stamp
-    <literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname>
-    from time stamp <literal>t9</literal>, the value of
-    <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>.
-       </para>
-       <para>For more information about the internals of how Apache HBase 
stores data, see <xref linkend="regions.arch" />.
-       </para>
-       </section>
-
-    <section xml:id="namespace">
-      <title>Namespace</title>
-      <para>
-      A namespace is a logical grouping of tables analogous to a database in 
relation database
-        systems. This abstraction lays the groundwork for upcoming 
multi-tenancy related features:
-        <itemizedlist>
-          <listitem>Quota Management (HBASE-8410) - Restrict the amount of 
resources (ie
-            regions, tables) a namespace can consume.</listitem>
-          <listitem>Namespace Security Administration (HBASE-9206) - provide 
another
-            level of security administration for tenants.</listitem>
-          <listitem>Region server groups (HBASE-6721) - A namespace/table can 
be
-            pinned onto a subset of regionservers thus guaranteeing a course 
level of
-            isolation.</listitem>
-        </itemizedlist>
-      </para>
-      <section xml:id="namespace_creation">
-        <title>Namespace management</title>
-        <para>
-        A namespace can be created, removed or altered. Namespace membership 
is determined during
-          table creation by specifying a fully-qualified table name of the 
form:
-          <para>
-            <code>&lt;table namespace&gt;:&lt;table qualifier&gt;</code>
-          </para>
-          <para>
-            Examples:
-          </para>
-<programlisting>
-#Create a namespace
-create_namespace 'my_ns'
-
-#create my_table in my_ns namespace
-create 'my_ns:my_table', 'fam'
-
-#delete namespace
-delete_namespace 'my_ns'
-
-#alter namespace
-alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
-</programlisting>
-        </para>
-      </section>
-      <section xml:id="namespace_special">
-        <title>Predefined namespaces</title>
-        <para>
-          There are two predefined special namespaces:
-          <itemizedlist>
-            <listitem>hbase - system namespace, used to contain hbase internal 
tables</listitem>
-            <listitem>default - tables with no explicit specified namespace 
will automatically
-              fall into this namespace.</listitem>
-          </itemizedlist>
-        </para>
-        <para>
-          Examples:
-<programlisting>
-#namespace=foo and table qualifier=bar
-create 'foo:bar', 'fam'
-
-#namespace=default and table qualifier=bar
-create 'bar', 'fam'
-</programlisting>
-        </para>
-      </section>
-    </section>
-
-    <section xml:id="table">
-      <title>Table</title>
-      <para>
-      Tables are declared up front at schema definition time.
-      </para>
-    </section>
-
-    <section xml:id="row">
-      <title>Row</title>
-      <para>Row keys are uninterrpreted bytes. Rows are
-      lexicographically sorted with the lowest order appearing first
-      in a table.  The empty byte array is used to denote both the
-      start and end of a tables' namespace.</para>
-    </section>
-
-    <section xml:id="columnfamily">
-      <title>Column Family<indexterm><primary>Column 
Family</primary></indexterm></title>
-        <para>
-      Columns in Apache HBase are grouped into <emphasis>column 
families</emphasis>.
-      All column members of a column family have the same prefix.  For 
example, the
-      columns <emphasis>courses:history</emphasis> and
-      <emphasis>courses:math</emphasis> are both members of the
-      <emphasis>courses</emphasis> column family.
-          The colon character (<literal
-          moreinfo="none">:</literal>) delimits the column family from the
-      <indexterm>column family <emphasis>qualifier</emphasis><primary>Column 
Family Qualifier</primary></indexterm>.
-        The column family prefix must be composed of
-      <emphasis>printable</emphasis> characters. The qualifying tail, the
-      column family <emphasis>qualifier</emphasis>, can be made of any
-      arbitrary bytes. Column families must be declared up front
-      at schema definition time whereas columns do not need to be
-      defined at schema time but can be conjured on the fly while
-      the table is up an running.</para>
-      <para>Physically, all column family members are stored together on the
-      filesystem.  Because tunings and
-      storage specifications are done at the column family level, it is
-      advised that all column family members have the same general access
-      pattern and size characteristics.</para>
-
-      <para></para>
-    </section>
-    <section xml:id="cells">
-      <title>Cells<indexterm><primary>Cells</primary></indexterm></title>
-      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
-      specifies a <literal>cell</literal> in HBase.
-      Cell content is uninterrpreted bytes</para>
-    </section>
-    <section xml:id="data_model_operations">
-       <title>Data Model Operations</title>
-       <para>The four primary data model operations are Get, Put, Scan, and 
Delete.  Operations are applied via
-       <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html";>HTable</link>
 instances.
-       </para>
-      <section xml:id="get">
-        <title>Get</title>
-        <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
 returns
-        attributes for a specified row.  Gets are executed via
-        <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29";>
-        HTable.get</link>.
-        </para>
-      </section>
-      <section xml:id="put">
-        <title>Put</title>
-        <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html";>Put</link>
 either
-        adds new rows to a table (if the key is new) or can update existing 
rows (if the key already exists).  Puts are executed via
-        <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29";>
-        HTable.put</link> (writeBuffer) or <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29";>
-        HTable.batch</link> (non-writeBuffer).
-        </para>
-      </section>
-      <section xml:id="scan">
-          <title>Scans</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
 allow
-          iteration over multiple rows for specified attributes.
-          </para>
-          <para>The following is an example of a
-           on an HTable table instance.  Assume that a table is populated with 
rows with keys "row1", "row2", "row3",
-           and then another set of rows with the keys "abc1", "abc2", and 
"abc3".  The following example shows how startRow and stopRow
-           can be applied to a Scan instance to return the rows beginning with 
"row".
-<programlisting>
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-
-HTable htable = ...      // instantiate HTable
-
-Scan scan = new Scan();
-scan.addColumn(CF, ATTR);
-scan.setStartRow(Bytes.toBytes("row")); // start key is inclusive
-scan.setStopRow(Bytes.toBytes("rox"));  // stop key is exclusive
-ResultScanner rs = htable.getScanner(scan);
-try {
-  for (Result r = rs.next(); r != null; r = rs.next()) {
-  // process result...
-} finally {
-  rs.close();  // always close the ResultScanner!
-}
-</programlisting>
-         </para>
-         <para>Note that generally the easiest way to specify a specific stop 
point for a scan is by using the <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html";>InclusiveStopFilter</link>
 class.
-         </para>
-        </section>
-      <section xml:id="delete">
-        <title>Delete</title>
-        <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html";>Delete</link>
 removes
-        a row from a table.  Deletes are executed via
-        <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29";>
-        HTable.delete</link>.
-        </para>
-        <para>HBase does not modify data in place, and so deletes are handled 
by creating new markers called <emphasis>tombstones</emphasis>.
-        These tombstones, along with the dead values, are cleaned up on major 
compactions.
-        </para>
-        <para>See <xref linkend="version.delete"/> for more information on 
deleting versions of columns, and see
-        <xref linkend="compaction"/> for more information on compactions.
-        </para>
-
-      </section>
-
-    </section>
-
-
-    <section xml:id="versions">
-      <title>Versions<indexterm><primary>Versions</primary></indexterm></title>
-
-      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
-      specifies a <literal>cell</literal> in HBase. It's possible to have an
-      unbounded number of cells where the row and column are the same but the
-      cell address differs only in its version dimension.</para>
-
-      <para>While rows and column keys are expressed as bytes, the version is
-      specified using a long integer. Typically this long contains time
-      instances such as those returned by
-      <code>java.util.Date.getTime()</code> or
-      <code>System.currentTimeMillis()</code>, that is: <quote>the difference,
-      measured in milliseconds, between the current time and midnight, January
-      1, 1970 UTC</quote>.</para>
-
-      <para>The HBase version dimension is stored in decreasing order, so that
-      when reading from a store file, the most recent values are found
-      first.</para>
-
-      <para>There is a lot of confusion over the semantics of
-      <literal>cell</literal> versions, in HBase. In particular, a couple
-      questions that often come up are:<itemizedlist>
-          <listitem>
-            <para>If multiple writes to a cell have the same version, are all
-            versions maintained or just the last?<footnote>
-                <para>Currently, only the last written is fetchable.</para>
-              </footnote></para>
-          </listitem>
-
-          <listitem>
-            <para>Is it OK to write cells in a non-increasing version
-            order?<footnote>
-                <para>Yes</para>
-              </footnote></para>
-          </listitem>
-        </itemizedlist></para>
-
-      <para>Below we describe how the version dimension in HBase currently
-      works<footnote>
-          <para>See <link
-          
xlink:href="https://issues.apache.org/jira/browse/HBASE-2406";>HBASE-2406</link>
-          for discussion of HBase versions. <link
-          xlink:href="http://outerthought.org/blog/417-ot.html";>Bending time
-          in HBase</link> makes for a good read on the version, or time,
-          dimension in HBase. It has more detail on versioning than is
-          provided here. As of this writing, the limiitation
-          <emphasis>Overwriting values at existing timestamps</emphasis>
-          mentioned in the article no longer holds in HBase. This section is
-          basically a synopsis of this article by Bruno Dumon.</para>
-        </footnote>.</para>
-
-      <section xml:id="versions.ops">
-        <title>Versions and HBase Operations</title>
-
-        <para>In this section we look at the behavior of the version dimension
-        for each of the core HBase operations.</para>
-
-        <section>
-          <title>Get/Scan</title>
-
-          <para>Gets are implemented on top of Scans. The below discussion of
-            <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
 applies equally to <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scans</link>.</para>
-
-          <para>By default, i.e. if you specify no explicit version, when
-          doing a <literal>get</literal>, the cell whose version has the
-          largest value is returned (which may or may not be the latest one
-          written, see later). The default behavior can be modified in the
-          following ways:</para>
-
-          <itemizedlist>
-            <listitem>
-              <para>to return more than one version, see <link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para>
-            </listitem>
-
-            <listitem>
-              <para>to return versions other than the latest, see <link
-              xlink:href="???">Get.setTimeRange()</link></para>
-
-              <para>To retrieve the latest version that is less than or equal
-              to a given value, thus giving the 'latest' state of the record
-              at a certain point in time, just use a range from 0 to the
-              desired version and set the max versions to 1.</para>
-            </listitem>
-          </itemizedlist>
-
-        </section>
-        <section xml:id="default_get_example">
-        <title>Default Get Example</title>
-        <para>The following Get will only retrieve the current version of the 
row
-<programlisting>
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Get get = new Get(Bytes.toBytes("row1"));
-Result r = htable.get(get);
-byte[] b = r.getValue(CF, ATTR);  // returns current version of value
-</programlisting>
-        </para>
-        </section>
-        <section xml:id="versioned_get_example">
-        <title>Versioned Get Example</title>
-        <para>The following Get will return the last 3 versions of the row.
-<programlisting>
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Get get = new Get(Bytes.toBytes("row1"));
-get.setMaxVersions(3);  // will return last 3 versions of row
-Result r = htable.get(get);
-byte[] b = r.getValue(CF, ATTR);  // returns current version of value
-List&lt;KeyValue&gt; kv = r.getColumn(CF, ATTR);  // returns all versions of 
this column
-</programlisting>
-        </para>
-        </section>
-
-        <section>
-          <title>Put</title>
-
-          <para>Doing a put always creates a new version of a
-          <literal>cell</literal>, at a certain timestamp. By default the
-          system uses the server's <literal>currentTimeMillis</literal>, but
-          you can specify the version (= the long integer) yourself, on a
-          per-column level. This means you could assign a time in the past or
-          the future, or use the long value for non-time purposes.</para>
-
-          <para>To overwrite an existing value, do a put at exactly the same
-          row, column, and version as that of the cell you would
-          overshadow.</para>
-          <section xml:id="implicit_version_example">
-          <title>Implicit Version Example</title>
-          <para>The following Put will be implicitly versioned by HBase with 
the current time.
-<programlisting>
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Put put = new Put(Bytes.toBytes(row));
-put.add(CF, ATTR, Bytes.toBytes( data));
-htable.put(put);
-</programlisting>
-          </para>
-          </section>
-          <section xml:id="explicit_version_example">
-          <title>Explicit Version Example</title>
-          <para>The following Put has the version timestamp explicitly set.
-<programlisting>
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Put put = new Put( Bytes.toBytes(row));
-long explicitTimeInMs = 555;  // just an example
-put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
-htable.put(put);
-</programlisting>
-          Caution:  the version timestamp is internally by HBase for things 
like time-to-live calculations.
-          It's usually best to avoid setting this timestamp yourself.  Prefer 
using a separate
-          timestamp attribute of the row, or have the timestamp a part of the 
rowkey, or both.
-          </para>
-          </section>
-
-        </section>
-
-        <section xml:id="version.delete">
-          <title>Delete</title>
-
-          <para>There are three different types of internal delete markers
-            <footnote><para>See Lars Hofhansl's blog for discussion of his 
attempt
-            adding another, <link 
xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html";>Scanning
 in HBase: Prefix Delete Marker</link></para></footnote>:
-            <itemizedlist>
-            <listitem><para>Delete:  for a specific version of a column.</para>
-            </listitem>
-            <listitem><para>Delete column:  for all versions of a 
column.</para>
-            </listitem>
-            <listitem><para>Delete family:  for all columns of a particular 
ColumnFamily</para>
-            </listitem>
-          </itemizedlist>
-          When deleting an entire row, HBase will internally create a 
tombstone for each ColumnFamily (i.e., not each individual column).
-         </para>
-          <para>Deletes work by creating <emphasis>tombstone</emphasis>
-          markers. For example, let's suppose we want to delete a row. For
-          this you can specify a version, or else by default the
-          <literal>currentTimeMillis</literal> is used. What this means is
-          <quote>delete all cells where the version is less than or equal to
-          this version</quote>. HBase never modifies data in place, so for
-          example a delete will not immediately delete (or mark as deleted)
-          the entries in the storage file that correspond to the delete
-          condition. Rather, a so-called <emphasis>tombstone</emphasis> is
-          written, which will mask the deleted values<footnote>
-              <para>When HBase does a major compaction, the tombstones are
-              processed to actually remove the dead values, together with the
-              tombstones themselves.</para>
-            </footnote>. If the version you specified when deleting a row is
-          larger than the version of any value in the row, then you can
-          consider the complete row to be deleted.</para>
-          <para>For an informative discussion on how deletes and versioning 
interact, see
-          the thread <link 
xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421";>Put
 w/ timestamp -> Deleteall -> Put w/ timestamp fails</link>
-          up on the user mailing list.</para>
-          <para>Also see <xref linkend="keyvalue"/> for more information on 
the internal KeyValue format.
-          </para>
-          <para>Delete markers are purged during the major compaction of 
store, 
-          unless the KEEP_DELETED_CELLS is set in the column family. In some 
-          scenarios, users want to keep the deletes for a time and you can set 
the 
-          delete TTL: hbase.hstore.time.to.purge.deletes in the configuration. 
-          If this delete TTL is not set, or set to 0, all delete markers 
including those 
-          with future timestamp are purged during the later major compaction. 
-          Otherwise, a delete marker is kept until the major compaction after 
-          marker's timestamp + delete TTL. 
-          </para>
-        </section>
-       </section>
-
-      <section>
-        <title>Current Limitations</title>
-
-        <section>
-          <title>Deletes mask Puts</title>
-
-          <para>Deletes mask puts, even puts that happened after the delete
-          was entered<footnote>
-              <para><link
-              
xlink:href="https://issues.apache.org/jira/browse/HBASE-2256";>HBASE-2256</link></para>
-            </footnote>. Remember that a delete writes a tombstone, which only
-          disappears after then next major compaction has run. Suppose you do
-          a delete of everything &lt;= T. After this you do a new put with a
-          timestamp &lt;= T. This put, even if it happened after the delete,
-          will be masked by the delete tombstone. Performing the put will not
-          fail, but when you do a get you will notice the put did have no
-          effect. It will start working again after the major compaction has
-          run. These issues should not be a problem if you use
-          always-increasing versions for new puts to a row. But they can occur
-          even if you do not care about time: just do delete and put
-          immediately after each other, and there is some chance they happen
-          within the same millisecond.</para>
-        </section>
-
-        <section>
-          <title>Major compactions change query results</title>
-
-          <para><quote>...create three cell versions at t1, t2 and t3, with a
-          maximum-versions setting of 2. So when getting all versions, only
-          the values at t2 and t3 will be returned. But if you delete the
-          version at t2 or t3, the one at t1 will appear again. Obviously,
-          once a major compaction has run, such behavior will not be the case
-          anymore...<footnote>
-              <para>See <emphasis>Garbage Collection</emphasis> in <link
-              xlink:href="http://outerthought.org/blog/417-ot.html";>Bending
-              time in HBase</link> </para>
-            </footnote></quote></para>
-        </section>
-      </section>
-    </section>
-    <section xml:id="dm.sort">
-      <title>Sort Order</title>
-      <para>All data model operations HBase return data in sorted order.  
First by row,
-      then by ColumnFamily, followed by column qualifier, and finally 
timestamp (sorted
-      in reverse, so newest records are returned first).
-      </para>
-    </section>
-    <section xml:id="dm.column.metadata">
-      <title>Column Metadata</title>
-      <para>There is no store of column metadata outside of the internal 
KeyValue instances for a ColumnFamily.
-      Thus, while HBase can support not only a wide number of columns per row, 
but a heterogenous set of columns
-      between rows as well, it is your responsibility to keep track of the 
column names.
-      </para>
-      <para>The only way to get a complete set of columns that exist for a 
ColumnFamily is to process all the rows.
-      For more information about how HBase stores data internally, see <xref 
linkend="keyvalue" />.
-         </para>
-    </section>
-    <section xml:id="joins"><title>Joins</title>
-      <para>Whether HBase supports joins is a common question on the 
dist-list, and there is a simple answer:  it doesn't,
-      at not least in the way that RDBMS' support them (e.g., with equi-joins 
or outer-joins in SQL).  As has been illustrated
-      in this chapter, the read data model operations in HBase are Get and 
Scan.
-      </para>
-      <para>However, that doesn't mean that equivalent join functionality 
can't be supported in your application, but
-      you have to do it yourself.  The two primary strategies are either 
denormalizing the data upon writing to HBase,
-      or to have lookup tables and do the join between HBase tables in your 
application or MapReduce code (and as RDBMS'
-      demonstrate, there are several strategies for this depending on the size 
of the tables, e.g., nested loops vs.
-      hash-joins).  So which is the best approach?  It depends on what you are 
trying to do, and as such there isn't a single
-      answer that works for every use case.
-      </para>
-    </section>
-    <section xml:id="acid"><title>ACID</title>
-        <pre>See <link 
xlink:href="http://hbase.apache.org/acid-semantics.html";>ACID Semantics</link>.
-            Lars Hofhansl has also written a note on
-            <link 
xlink:href="http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html";>ACID 
in HBase</link>.</pre>
-    </section>
-  </chapter>  <!-- data model -->
-
-  <!--  schema design -->
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"; 
href="schema_design.xml" />
-
-  <chapter xml:id="mapreduce">
-  <title>HBase and MapReduce</title>
-  <para>See <link 
xlink:href="http://hbase.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description";>
-  HBase and MapReduce</link> up in javadocs.
-  Start there.  Below is some additional help.</para>
-  <para>For more information about MapReduce (i.e., the framework in general), 
see the Hadoop site (TODO: Need good links here --
-      we used to have some but they rotted against apache hadoop).</para>
-  <caution>
-    <title>Notice to Mapreduce users of HBase 0.96.1 and above</title>
-    <para>Some mapreduce jobs that use HBase fail to launch. The symptom is an
-    exception similar to the following:
-    <programlisting>
-Exception in thread "main" java.lang.IllegalAccessError: class
-    com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass
-    com.google.protobuf.LiteralByteString
-    at java.lang.ClassLoader.defineClass1(Native Method)
-    at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
-    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
-    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
-    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
-    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
-    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
-    at java.security.AccessController.doPrivileged(Native Method)
-    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
-    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
-    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
-    at
-    org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818)
-    at
-    
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433)
-    at
-    
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186)
-    at
-    
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147)
-    at
-    
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270)
-    at
-    
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100)
-...
-</programlisting>
-    This is because of an optimization introduced in <link
-    
xlink:href="https://issues.apache.org/jira/browse/HBASE-9867";>HBASE-9867</link>
-    that inadvertently introduced a classloader dependency.
-    </para>
-    <para>This affects both jobs using the <code>-libjars</code> option and
-    "fat jar," those which package their runtime dependencies in a nested
-    <code>lib</code> folder.</para>
-    <para>In order to satisfy the new classloader requirements,
-    hbase-protocol.jar must be included in Hadoop's classpath. This can be
-    resolved system-wide by including a reference to the hbase-protocol.jar in
-    hadoop's lib directory, via a symlink or by copying the jar into the new
-    location.</para>
-    <para>This can also be achieved on a per-job launch basis by including it
-    in the <code>HADOOP_CLASSPATH</code> environment variable at job submission
-    time. When launching jobs that package their dependencies, all three of the
-    following job launching commands satisfy this requirement:</para>
-<programlisting>
-$ HADOOP_CLASSPATH=/path/to/hbase-protocol.jar:/path/to/hbase/conf hadoop jar 
MyJob.jar MyJobMainClass
-$ HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase/conf hadoop jar MyJob.jar 
MyJobMainClass
-$ HADOOP_CLASSPATH=$(hbase classpath) hadoop jar MyJob.jar MyJobMainClass
-</programlisting>
-  <para>For jars that do not package their dependencies, the following command
-  structure is necessary:</para>
-<programlisting>
-$ HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp.jar 
MyJobMainClass -libjars $(hbase mapredcp | tr ':' ',') ...
-</programlisting>
-  <para>See also <link
-  
xlink:href="https://issues.apache.org/jira/browse/HBASE-10304";>HBASE-10304</link>
-  for further discussion of this issue.</para>
-  </caution>
-  <section xml:id="splitter">
-    <title>Map-Task Splitting</title>
-    <section xml:id="splitter.default">
-    <title>The Default HBase MapReduce Splitter</title>
-    <para>When <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html";>TableInputFormat</link>
-    is used to source an HBase table in a MapReduce job,
-    its splitter will make a map task for each region of the table.
-    Thus, if there are 100 regions in the table, there will be
-    100 map-tasks for the job - regardless of how many column families are 
selected in the Scan.</para>
-    </section>
-    <section xml:id="splitter.custom">
-    <title>Custom Splitters</title>
-    <para>For those interested in implementing custom splitters, see the 
method <code>getSplits</code> in
-    <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html";>TableInputFormatBase</link>.
-    That is where the logic for map-task assignment resides.
-    </para>
-    </section>
-  </section>
-  <section xml:id="mapreduce.example">
-    <title>HBase MapReduce Examples</title>
-    <section xml:id="mapreduce.example.read">
-    <title>HBase MapReduce Read Example</title>
-    <para>The following is an example of using HBase as a MapReduce source in 
read-only manner.  Specifically,
-    there is a Mapper instance but no Reducer, and nothing is being emitted 
from the Mapper.  There job would be defined
-    as follows...
-       <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config, "ExampleRead");
-job.setJarByClass(MyReadJob.class);     // class that contains mapper
-
-Scan scan = new Scan();
-scan.setCaching(500);        // 1 is the default in Scan, which will be bad 
for MapReduce jobs
-scan.setCacheBlocks(false);  // don't set to true for MR jobs
-// set other scan attrs
-...
-
-TableMapReduceUtil.initTableMapperJob(
-  tableName,        // input HBase table name
-  scan,             // Scan instance to control CF and attribute selection
-  MyMapper.class,   // mapper
-  null,             // mapper output key
-  null,             // mapper output value
-  job);
-job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't 
emitting anything from mapper
-
-boolean b = job.waitForCompletion(true);
-if (!b) {
-  throw new IOException("error with job!");
-}
-  </programlisting>
-  ...and the mapper instance would extend <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html";>TableMapper</link>...
-       <programlisting>
-public static class MyMapper extends TableMapper&lt;Text, Text&gt; {
-
-  public void map(ImmutableBytesWritable row, Result value, Context context) 
throws InterruptedException, IOException {
-    // process data for the row from the Result instance.
-   }
-}
-    </programlisting>
-         </para>
-        </section>
-    <section xml:id="mapreduce.example.readwrite">
-    <title>HBase MapReduce Read/Write Example</title>
-    <para>The following is an example of using HBase both as a source and as a 
sink with MapReduce.
-    This example will simply copy data from one table to another.
-    <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config,"ExampleReadWrite");
-job.setJarByClass(MyReadWriteJob.class);    // class that contains mapper
-
-Scan scan = new Scan();
-scan.setCaching(500);        // 1 is the default in Scan, which will be bad 
for MapReduce jobs
-scan.setCacheBlocks(false);  // don't set to true for MR jobs
-// set other scan attrs
-
-TableMapReduceUtil.initTableMapperJob(
-       sourceTable,      // input table
-       scan,             // Scan instance to control CF and attribute selection
-       MyMapper.class,   // mapper class
-       null,             // mapper output key
-       null,             // mapper output value
-       job);
-TableMapReduceUtil.initTableReducerJob(
-       targetTable,      // output table
-       null,             // reducer class
-       job);
-job.setNumReduceTasks(0);
-
-boolean b = job.waitForCompletion(true);
-if (!b) {
-    throw new IOException("error with job!");
-}
-    </programlisting>
-       An explanation is required of what 
<classname>TableMapReduceUtil</classname> is doing, especially with the reducer.
-       <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html";>TableOutputFormat</link>
 is being used
-       as the outputFormat class, and several parameters are being set on the 
config (e.g., TableOutputFormat.OUTPUT_TABLE), as
-       well as setting the reducer output key to 
<classname>ImmutableBytesWritable</classname> and reducer value to 
<classname>Writable</classname>.
-       These could be set by the programmer on the job and conf, but 
<classname>TableMapReduceUtil</classname> tries to make things easier.
-       <para>The following is the example mapper, which will create a 
<classname>Put</classname> and matching the input <classname>Result</classname>
-       and emit it.  Note:  this is what the CopyTable utility does.
-       </para>
-    <programlisting>
-public static class MyMapper extends TableMapper&lt;ImmutableBytesWritable, 
Put&gt;  {
-
-       public void map(ImmutableBytesWritable row, Result value, Context 
context) throws IOException, InterruptedException {
-               // this example is just copying the data from the source 
table...
-               context.write(row, resultToPut(row,value));
-       }
-
-       private static Put resultToPut(ImmutableBytesWritable key, Result 
result) throws IOException {
-               Put put = new Put(key.get());
-               for (KeyValue kv : result.raw()) {
-                       put.add(kv);
-               }
-               return put;
-       }
-}
-    </programlisting>
-    <para>There isn't actually a reducer step, so 
<classname>TableOutputFormat</classname> takes care of sending the 
<classname>Put</classname>
-    to the target table.
-    </para>
-    <para>This is just an example, developers could choose not to use 
<classname>TableOutputFormat</classname> and connect to the
-    target table themselves.
-    </para>
-    </para>
-    </section>
-    <section xml:id="mapreduce.example.readwrite.multi">
-      <title>HBase MapReduce Read/Write Example With Multi-Table Output</title>
-      <para>TODO:  example for <classname>MultiTableOutputFormat</classname>.
-    </para>
-    </section>
-    <section xml:id="mapreduce.example.summary">
-    <title>HBase MapReduce Summary to HBase Example</title>
-    <para>The following example uses HBase as a MapReduce source and sink with 
a summarization step.  This example will
-    count the number of distinct instances of a value in a table and write 
those summarized counts in another table.
-    <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config,"ExampleSummary");
-job.setJarByClass(MySummaryJob.class);     // class that contains mapper and 
reducer
-
-Scan scan = new Scan();
-scan.setCaching(500);        // 1 is the default in Scan, which will be bad 
for MapReduce jobs
-scan.setCacheBlocks(false);  // don't set to true for MR jobs
-// set other scan attrs
-
-TableMapReduceUtil.initTableMapperJob(
-       sourceTable,        // input table
-       scan,               // Scan instance to control CF and attribute 
selection
-       MyMapper.class,     // mapper class
-       Text.class,         // mapper output key
-       IntWritable.class,  // mapper output value
-       job);
-TableMapReduceUtil.initTableReducerJob(
-       targetTable,        // output table
-       MyTableReducer.class,    // reducer class
-       job);
-job.setNumReduceTasks(1);   // at least one, adjust as required
-
-boolean b = job.waitForCompletion(true);
-if (!b) {
-       throw new IOException("error with job!");
-}
-    </programlisting>
-    In this example mapper a column with a String-value is chosen as the value 
to summarize upon.
-    This value is used as the key to emit from the mapper, and an 
<classname>IntWritable</classname> represents an instance counter.
-    <programlisting>
-public static class MyMapper extends TableMapper&lt;Text, IntWritable&gt;  {
-       public static final byte[] CF = "cf".getBytes();
-       public static final byte[] ATTR1 = "attr1".getBytes();
-
-       private final IntWritable ONE = new IntWritable(1);
-       private Text text = new Text();
-
-       public void map(ImmutableBytesWritable row, Result value, Context 
context) throws IOException, InterruptedException {
-               String val = new String(value.getValue(CF, ATTR1));
-               text.set(val);     // we can only emit Writables...
-
-               context.write(text, ONE);
-       }
-}
-    </programlisting>
-    In the reducer, the "ones" are counted (just like any other MR example 
that does this), and then emits a <classname>Put</classname>.
-    <programlisting>
-public static class MyTableReducer extends TableReducer&lt;Text, IntWritable, 
ImmutableBytesWritable&gt;  {
-       public static final byte[] CF = "cf".getBytes();
-       public static final byte[] COUNT = "count".getBytes();
-
-       public void reduce(Text key, Iterable&lt;IntWritable&gt; values, 
Context context) throws IOException, InterruptedException {
-               int i = 0;
-               for (IntWritable val : values) {
-                       i += val.get();
-               }
-               Put put = new Put(Bytes.toBytes(key.toString()));
-               put.add(CF, COUNT, Bytes.toBytes(i));
-
-               context.write(null, put);
-       }
-}
-    </programlisting>
-    </para>
-    </section>
-    <section xml:id="mapreduce.example.summary.file">
-    <title>HBase MapReduce Summary to File Example</title>
-       <para>This very similar to the summary example above, with exception 
that this is using HBase as a MapReduce source
-       but HDFS as the sink.  The differences are in the job setup and in the 
reducer.  The mapper remains the same.
-       </para>
-    <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config,"ExampleSummaryToFile");
-job.setJarByClass(MySummaryFileJob.class);     // class that contains mapper 
and reducer
-
-Scan scan = new Scan();
-scan.setCaching(500);        // 1 is the default in Scan, which will be bad 
for MapReduce jobs
-scan.setCacheBlocks(false);  // don't set to true for MR jobs
-// set other scan attrs
-
-TableMapReduceUtil.initTableMapperJob(
-       sourceTable,        // input table
-       scan,               // Scan instance to control CF and attribute 
selection
-       MyMapper.class,     // mapper class
-       Text.class,         // mapper output key
-       IntWritable.class,  // mapper output value
-       job);
-job.setReducerClass(MyReducer.class);    // reducer class
-job.setNumReduceTasks(1);    // at least one, adjust as required
-FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile"));  // 
adjust directories as required
-
-boolean b = job.waitForCompletion(true);
-if (!b) {
-       throw new IOException("error with job!");
-}
-    </programlisting>
-    As stated above, the previous Mapper can run unchanged with this example.
-    As for the Reducer, it is a "generic" Reducer instead of extending 
TableMapper and emitting Puts.
-    <programlisting>
- public static class MyReducer extends Reducer&lt;Text, IntWritable, Text, 
IntWritable&gt;  {
-
-       public void reduce(Text key, Iterable&lt;IntWritable&gt; values, 
Context context) throws IOException, InterruptedException {
-               int i = 0;
-               for (IntWritable val : values) {
-                       i += val.get();
-               }
-               context.write(key, new IntWritable(i));
-       }
-}
-    </programlisting>
-   </section>
-   <section xml:id="mapreduce.example.summary.noreducer">
-    <title>HBase MapReduce Summary to HBase Without Reducer</title>
-       <para>It is also possible to perform summaries without a reducer - if 
you use HBase as the reducer.
-       </para>
-       <para>An HBase target table would need to exist for the job summary.  
The HTable method <code>incrementColumnValue</code>
-       would be used to atomically increment values.  From a performance 
perspective, it might make sense to keep a Map
-       of values with their values to be incremeneted for each map-task, and 
make one update per key at during the <code>
-       cleanup</code> method of the mapper.  However, your milage may vary 
depending on the number of rows to be processed and
-       unique keys.
-       </para>
-       <para>In the end, the summary results are in HBase.
-       </para>
-   </section>
-   <section xml:id="mapreduce.example.summary.rdbms">
-    <title>HBase MapReduce Summary to RDBMS</title>
-       <para>Sometimes it is more appropriate to generate summaries to an 
RDBMS.  For these cases, it is possible
-       to generate summaries directly to an RDBMS via a custom reducer.  The 
<code>setup</code> method
-       can connect to an RDBMS (the connection information can be passed via 
custom parameters in the context) and the
-       cleanup method can close the connection.
-       </para>
-       <para>It is critical to understand that number of reducers for the job 
affects the summarization implementation, and
-       you'll have to design this into your reducer.  Specifically, whether it 
is designed to run as a singleton (one reducer)
-       or multiple reducers.  Neither is right or wrong, it depends on your 
use-case.  Recognize that the more reducers that
-       are assigned to the job, the more simultaneous connections to the RDBMS 
will be created - this will scale, but only to a point.
-       </para>
-    <programlisting>
- public static class MyRdbmsReducer extends Reducer&lt;Text, IntWritable, 
Text, IntWritable&gt;  {
-
-       private Connection c = null;
-
-       public void setup(Context context) {
-               // create DB connection...
-       }
-
-       public void reduce(Text key, Iterable&lt;IntWritable&gt; values, 
Context context) throws IOException, InterruptedException {
-               // do summarization
-               // in this example the keys are Text, but this is just an 
example
-       }
-
-       public void cleanup(Context context) {
-               // close db connection
-       }
-
-}
-    </programlisting>
-       <para>In the end, the summary results are written to your RDBMS table/s.
-       </para>
-   </section>
-
-   </section> <!--  mr examples -->
-   <section xml:id="mapreduce.htable.access">
-   <title>Accessing Other HBase Tables in a MapReduce Job</title>
-       <para>Although the framework currently allows one HBase table as input 
to a
-    MapReduce job, other HBase tables can
-       be accessed as lookup tables, etc., in a
-    MapReduce job via creating an HTable instance in the setup method of the 
Mapper.
-       <programlisting>public class MyMapper extends TableMapper&lt;Text, 
LongWritable&gt; {
-  private HTable myOtherTable;
-
-  public void setup(Context context) {
-    myOtherTable = new HTable("myOtherTable");
-  }
-
-  public void map(ImmutableBytesWritable row, Result value, Context context) 
throws IOException, InterruptedException {
-       // process Result...
-       // use 'myOtherTable' for lookups
-  }
-
-  </programlisting>
-   </para>
-    </section>
-  <section xml:id="mapreduce.specex">
-  <title>Speculative Execution</title>
-  <para>It is generally advisable to turn off speculative execution for
-      MapReduce jobs that use HBase as a source.  This can either be done on a
-      per-Job basis through properties, on on the entire cluster.  Especially
-      for longer running jobs, speculative execution will create duplicate
-      map-tasks which will double-write your data to HBase; this is probably
-      not what you want.
-  </para>
-  <para>See <xref linkend="spec.ex"/> for more information.
-  </para>
-  </section>
-  </chapter>  <!--  mapreduce -->
-
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"; href="security.xml" />
-
-  <chapter xml:id="architecture">
-    <title>Architecture</title>
-       <section xml:id="arch.overview">
-       <title>Overview</title>
-         <section xml:id="arch.overview.nosql">
-         <title>NoSQL?</title>
-         <para>HBase is a type of "NoSQL" database.  "NoSQL" is a general term 
meaning that the database isn't an RDBMS which
-         supports SQL as its primary access language, but there are many types 
of NoSQL databases:  BerkeleyDB is an
-         example of a local NoSQL database, whereas HBase is very much a 
distributed database.  Technically speaking,
-         HBase is really more a "Data Store" than "Data Base" because it lacks 
many of the features you find in an RDBMS,
-         such as typed columns, secondary indexes, triggers, and advanced 
query languages, etc.
-         </para>
-         <para>However, HBase has many features which supports both linear and 
modular scaling.  HBase clusters expand
-         by adding RegionServers that are hosted on commodity class servers. 
If a cluster expands from 10 to 20
-         RegionServers, for example, it doubles both in terms of storage and 
as well as processing capacity.
-         RDBMS can scale well, but only up to a point - specifically, the size 
of a single database server - and for the best
-         performance requires specialized hardware and storage devices.  HBase 
features of note are:
-               <itemizedlist>
-              <listitem>Strongly consistent reads/writes:  HBase is not an 
"eventually consistent" DataStore.  This
-              makes it very suitable for tasks such as high-speed counter 
aggregation.  </listitem>
-              <listitem>Automatic sharding:  HBase tables are distributed on 
the cluster via regions, and regions are
-              automatically split and re-distributed as your data 
grows.</listitem>
-              <listitem>Automatic RegionServer failover</listitem>
-              <listitem>Hadoop/HDFS Integration:  HBase supports HDFS out of 
the box as its distributed file system.</listitem>
-              <listitem>MapReduce:  HBase supports massively parallelized 
processing via MapReduce for using HBase as both
-              source and sink.</listitem>
-              <listitem>Java Client API:  HBase supports an easy to use Java 
API for programmatic access.</listitem>
-              <listitem>Thrift/REST API:  HBase also supports Thrift and REST 
for non-Java front-ends.</listitem>
-              <listitem>Block Cache and Bloom Filters:  HBase supports a Block 
Cache and Bloom Filters for high volume query optimization.</listitem>
-              <listitem>Operational Management:  HBase provides build-in 
web-pages for operational insight as well as JMX metrics.</listitem>
-            </itemizedlist>
-         </para>
-      </section>
-
-         <section xml:id="arch.overview.when">
-           <title>When Should I Use HBase?</title>
-                 <para>HBase isn't suitable for every problem.</para>
-                 <para>First, make sure you have enough data.  If you have 
hundreds of millions or billions of rows, then
-                   HBase is a good candidate.  If you only have a few 
thousand/million rows, then using a traditional RDBMS
-                   might be a better choice due to the fact that all of your 
data might wind up on a single node (or two) and
-                   the rest of the cluster may be sitting idle.
-                 </para>
-                 <para>Second, make sure you can live without all the extra 
features that an RDBMS provides (e.g., typed columns,
-                 secondary indexes, transactions, advanced query languages, 
etc.)  An application built against an RDBMS cannot be
-                 "ported" to HBase by simply changing a JDBC driver, for 
example.  Consider moving from an RDBMS to HBase as a
-                 complete redesign as opposed to a port.
-              </para>
-                 <para>Third, make sure you have enough hardware.  Even HDFS 
doesn't do well with anything less than
-                5 DataNodes (due to things such as HDFS block replication 
which has a default of 3), plus a NameNode.
-                </para>
-                <para>HBase can run quite well stand-alone on a laptop - but 
this should be considered a development
-                configuration only.
-                </para>
-      </section>
-      <section xml:id="arch.overview.hbasehdfs">
-        <title>What Is The Difference Between HBase and Hadoop/HDFS?</title>
-          <para><link xlink:href="http://hadoop.apache.org/hdfs/";>HDFS</link> 
is a distributed file system that is well suited for the storage of large files.
-          It's documentation states that it is not, however, a general purpose 
file system, and does not provide fast individual record lookups in files.
-          HBase, on the other hand, is built on top of HDFS and provides fast 
record lookups (and updates) for large tables.
-          This can sometimes be a point of conceptual confusion.  HBase 
internally puts your data in indexed "StoreFiles" that exist
-          on HDFS for high-speed lookups.  See the <xref linkend="datamodel" 
/> and the rest of this chapter for more information on how HBase achieves its 
goals.
-         </para>
-      </section>
-       </section>
-
-       <section xml:id="arch.catalog">
-        <title>Catalog Tables</title>
-         <para>The catalog tables -ROOT- and .META. exist as HBase tables.  
They are filtered out
-         of the HBase shell's <code>list</code> command, but they are in fact 
tables just like any other.
-     </para>
-         <section xml:id="arch.catalog.root">
-          <title>ROOT</title>
-          <para>-ROOT- keeps track of where the .META. table is.  The -ROOT- 
table structure is as follows:
-       </para>
-       <para>Key:
-            <itemizedlist>
-              <listitem>.META. region key (<code>.META.,,1</code>)</listitem>
-            </itemizedlist>
-       </para>
-       <para>Values:
-            <itemizedlist>
-              <listitem><code>info:regioninfo</code> (serialized <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html";>HRegionInfo</link>
-               instance of .META.)</listitem>
-              <listitem><code>info:server</code> (server:port of the 
RegionServer holding .META.)</listitem>
-              <listitem><code>info:serverstartcode</code> (start-time of the 
RegionServer process holding .META.)</listitem>
-            </itemizedlist>
-       </para>
-          </section>
-         <section xml:id="arch.catalog.meta">
-          <title>META</title>
-          <para>The .META. table keeps a list of all regions in the system. 
The .META. table structure is as follows:
-       </para>
-       <para>Key:
-            <itemizedlist>
-              <listitem>Region key of the format (<code>[table],[region start 
key],[region id]</code>)</listitem>
-            </itemizedlist>
-       </para>
-       <para>Values:
-            <itemizedlist>
-              <listitem><code>info:regioninfo</code> (serialized <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html";>
-              HRegionInfo</link> instance for this region)
-              </listitem>
-              <listitem><code>info:server</code> (server:port of the 
RegionServer containing this region)</listitem>
-              <listitem><code>info:serverstartcode</code> (start-time of the 
RegionServer process containing this region)</listitem>
-            </itemizedlist>
-       </para>
-       <para>When a table is in the process of splitting two other columns 
will be created, <code>info:splitA</code> and <code>info:splitB</code>
-       which represent the two daughter regions.  The values for these columns 
are also serialized HRegionInfo instances.
-       After the region has been split eventually this row will be deleted.
-       </para>
-       <para>Notes on HRegionInfo:  the empty key is used to denote table 
start and table end.  A region with an empty start key
-       is the first region in a table.  If region has both an empty start and 
an empty end key, it's the only region in the table
-       </para>
-       <para>In the (hopefully unlikely) event that programmatic processing of 
catalog metadata is required, see the
-         <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29";>Writables</link>
 utility.
-       </para>
-          </section>
-          <section xml:id="arch.catalog.startup">
-           <title>Startup Sequencing</title>
-           <para>The META location is set in ROOT first.  Then META is updated 
with server and startcode values.
-           </para>
-           <para>For information on region-RegionServer assignment, see <xref 
linkend="regions.arch.assignment"/>.
-           </para>
-           </section>
-     </section>  <!--  catalog -->
-
-       <section xml:id="client">
-        <title>Client</title>
-     <para>The HBase client
-         <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html";>HTable</link>
-         is responsible for finding RegionServers that are serving the
-         particular row range of interest.  It does this by querying
-         the <code>.META.</code> and <code>-ROOT-</code> catalog tables
-         (TODO: Explain).  After locating the required
-         region(s), the client <emphasis>directly</emphasis> contacts
-         the RegionServer serving that region (i.e., it does not go
-         through the master) and issues the read or write request.
-         This information is cached in the client so that subsequent requests
-         need not go through the lookup process.  Should a region be reassigned
-         either by the master load balancer or because a RegionServer has died,
-         the client will requery the catalog tables to determine the new
-         location of the user region.
-    </para>
-    <para>See <xref linkend="master.runtime"/> for more information about the 
impact of the Master on HBase Client
-    communication.
-    </para>
-    <para>Administrative functions are handled through <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html";>HBaseAdmin</link>
-    </para>
-          <section xml:id="client.connections"><title>Connections</title>
-           <para>For connection configuration information, see <xref 
linkend="client_dependencies" />.
-         </para>
-         <para><emphasis><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html";>HTable</link>
-                 instances are not thread-safe</emphasis>.  Only one thread 
use an instance of HTable at any given
-             time.  When creating HTable instances, it is advisable to use the 
same <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration";>HBaseConfiguration</link>
-instance.  This will ensure sharing of ZooKeeper and socket instances to the 
RegionServers
-which is usually what you want.  For example, this is preferred:
-               <programlisting>HBaseConfiguration conf = 
HBaseConfiguration.create();
-HTable table1 = new HTable(conf, "myTable");
-HTable table2 = new HTable(conf, "myTable");</programlisting>
-               as opposed to this:
-        <programlisting>HBaseConfiguration conf1 = HBaseConfiguration.create();
-HTable table1 = new HTable(conf1, "myTable");
-HBaseConfiguration conf2 = HBaseConfiguration.create();
-HTable table2 = new HTable(conf2, "myTable");</programlisting>
-        For more information about how connections are handled in the HBase 
client,
-        see <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html";>HConnectionManager</link>.
-          </para>
-          <section xml:id="client.connection.pooling"><title>Connection 
Pooling</title>
-            <para>For applications which require high-end multithreaded access 
(e.g., web-servers or application servers that may serve many application 
threads
-            in a single JVM), one solution is <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html";>HTablePool</link>.
-            But as written currently, it is difficult to control client 
resource consumption when using HTablePool.
-            </para>
-            <para>
-                Another solution is to precreate an 
<classname>HConnection</classname> using
-                <programlisting>// Create a connection to the cluster.
-HConnection connection = HConnectionManager.createConnection(Configuration);
-HTableInterface table = connection.getTable("myTable");
-// use table as needed, the table returned is lightweight
-table.close();
-// use the connection for other access to the cluster
-connection.close();</programlisting>
-                Constructing HTableInterface implementation is very 
lightweight and resources are controlled/shared if you go this route.
-            </para>
-          </section>
-         </section>
-          <section xml:id="client.writebuffer"><title>WriteBuffer and Batch 
Methods</title>
-           <para>If <xref linkend="perf.hbase.client.autoflush" /> is turned 
off on
-               <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html";>HTable</link>,
-               <classname>Put</classname>s are sent to RegionServers when the 
writebuffer
-               is filled.  The writebuffer is 2MB by default.  Before an 
HTable instance is
-               discarded, either <methodname>close()</methodname> or
-               <methodname>flushCommits()</methodname> should be invoked so 
Puts
-               will not be lost.
-             </para>
-             <para>Note: <code>htable.delete(Delete);</code> does not go in 
the writebuffer!  This only applies to Puts.
-             </para>
-             <para>For additional information on write durability, review the 
<link xlink:href="acid-semantics.html">ACID semantics</link> page.
-             </para>
-       <para>For fine-grained control of batching of
-           <classname>Put</classname>s or <classname>Delete</classname>s,
-           see the <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29";>batch</link>
 methods on HTable.
-          </para>
-          </section>
-          <section xml:id="client.external"><title>External Clients</title>
-           <para>Information on non-Java clients and custom protocols is 
covered in <xref linkend="external_apis" />
-           </para>
-               </section>
-       </section>
-
-    <section xml:id="client.filter"><title>Client Request Filters</title>
-      <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
 and <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
 instances can be
-       optionally configured with <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html";>filters</link>
 which are applied on the RegionServer.
-      </para>
-      <para>Filters can be confusing because there are many different types, 
and it is best to approach them by understanding the groups
-      of Filter functionality.
-      </para>
-      <section xml:id="client.filter.structural"><title>Structural</title>
-        <para>Structural Filters contain other Filters.</para>
-        <section xml:id="client.filter.structural.fl"><title>FilterList</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html";>FilterList</link>
-          represents a list of Filters with a relationship of 
<code>FilterList.Operator.MUST_PASS_ALL</code> or
-          <code>FilterList.Operator.MUST_PASS_ONE</code> between the Filters.  
The following example shows an 'or' between two
-          Filters (checking for either 'my value' or 'my other value' on the 
same attribute).
-<programlisting>
-FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);
-SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       Bytes.toBytes("my value")
-       );
-list.add(filter1);
-SingleColumnValueFilter filter2 = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       Bytes.toBytes("my other value")
-       );
-list.add(filter2);
-scan.setFilter(list);
-</programlisting>
-          </para>
-        </section>
-      </section>
-      <section xml:id="client.filter.cv"><title>Column Value</title>
-        <section 
xml:id="client.filter.cv.scvf"><title>SingleColumnValueFilter</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html";>SingleColumnValueFilter</link>
-          can be used to test column values for equivalence (<code><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html";>CompareOp.EQUAL</link>
-          </code>), inequality (<code>CompareOp.NOT_EQUAL</code>), or ranges
-          (e.g., <code>CompareOp.GREATER</code>).  The folowing is example of 
testing equivalence a column to a String value "my value"...
-<programlisting>
-SingleColumnValueFilter filter = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       Bytes.toBytes("my value")
-       );
-scan.setFilter(filter);
-</programlisting>
-          </para>
-        </section>
-      </section>
-      <section xml:id="client.filter.cvp"><title>Column Value 
Comparators</title>
-        <para>There are several Comparator classes in the Filter package that 
deserve special mention.
-        These Comparators are used in concert with other Filters, such as  
<xref linkend="client.filter.cv.scvf" />.
-        </para>
-        <section 
xml:id="client.filter.cvp.rcs"><title>RegexStringComparator</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html";>RegexStringComparator</link>
-          supports regular expressions for value comparisons.
-<programlisting>
-RegexStringComparator comp = new RegexStringComparator("my.");   // any value 
that starts with 'my'
-SingleColumnValueFilter filter = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       comp
-       );
-scan.setFilter(filter);
-</programlisting>
-          See the Oracle JavaDoc for <link 
xlink:href="http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html";>supported
 RegEx patterns in Java</link>.
-          </para>
-        </section>
-        <section 
xml:id="client.filter.cvp.rcs"><title>SubstringComparator</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html";>SubstringComparator</link>
-          can be used to determine if a given substring exists in a value.  
The comparison is case-insensitive.
-          </para>
-<programlisting>
-SubstringComparator comp = new SubstringComparator("y val");   // looking for 
'my value'
-SingleColumnValueFilter filter = new SingleColumnValueFilter(
-       cf,
-       column,
-       CompareOp.EQUAL,
-       comp
-       );
-scan.setFilter(filter);
-</programlisting>
-        </section>
-        <section 
xml:id="client.filter.cvp.bfp"><title>BinaryPrefixComparator</title>
-          <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html";>BinaryPrefixComparator</link>.</para>
-        </section>
-        <section xml:id="client.filter.cvp.bc"><title>BinaryComparator</title>
-          <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html";>BinaryComparator</link>.</para>
-        </section>
-      </section>
-      <section xml:id="client.filter.kvm"><title>KeyValue Metadata</title>
-        <para>As HBase stores data internally as KeyValue pairs, KeyValue 
Metadata Filters evaluate the existence of keys (i.e., ColumnFamily:Column 
qualifiers)
-        for a row, as opposed to values the previous section.
-        </para>
-        <section xml:id="client.filter.kvm.ff"><title>FamilyFilter</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html";>FamilyFilter</link>
 can be used
-          to filter on the ColumnFamily.  It is generally a better idea to 
select ColumnFamilies in the Scan than to do it with a Filter.</para>
-        </section>
-        <section xml:id="client.filter.kvm.qf"><title>QualifierFilter</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html";>QualifierFilter</link>
 can be used
-          to filter based on Column (aka Qualifier) name.
-          </para>
-        </section>
-        <section 
xml:id="client.filter.kvm.cpf"><title>ColumnPrefixFilter</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html";>ColumnPrefixFilter</link>
 can be used
-          to filter based on the lead portion of Column (aka Qualifier) names.
-          </para>
-                 <para>A ColumnPrefixFilter seeks ahead to the first column 
matching the prefix in each row and for each involved column family. It can be 
used to efficiently
-                 get a subset of the columns in very wide rows.
-             </para>
-          <para>Note: The same column qualifier can be used in different 
column families. This filter returns all matching columns.
-          </para>
-          <para>Example: Find all columns in a row and family that start with 
"abc"
-<programlisting>
-HTableInterface t = ...;
-byte[] row = ...;
-byte[] family = ...;
-byte[] prefix = Bytes.toBytes("abc");
-Scan scan = new Scan(row, row); // (optional) limit to one row
-scan.addFamily(family); // (optional) limit to one family
-Filter f = new ColumnPrefixFilter(prefix);
-scan.setFilter(f);
-scan.setBatch(10); // set this if there could be many columns returned
-ResultScanner rs = t.getScanner(scan);
-for (Result r = rs.next(); r != null; r = rs.next()) {
-  for (KeyValue kv : r.raw()) {
-    // each kv represents a column
-  }
-}
-rs.close();
-</programlisting>
-</para>
-        </section>
-        <section 
xml:id="client.filter.kvm.mcpf"><title>MultipleColumnPrefixFilter</title>
-          <para><link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html";>MultipleColumnPrefixFilter</link>
 behaves like ColumnPrefixFilter
-          but allows specifying multiple prefixes.
-          </para>
-             <para>Like ColumnPrefixFilter, MultipleColumnPrefixFilter 
efficiently seeks ahead to the first column matching the lowest prefix and also 
seeks past ranges of columns between prefixes.
-             It can be used to efficiently get discontinuous sets of columns 
from very wide rows.
-                 </para>
-          <para>Example: Find all columns in a row and family that start with 
"abc" or "xyz"
-<programlisting>
-HTableInterface t = ...;
-byte[] row = ...;
-byte[] family = ...;
-byte[][] prefixes = new byte[][] {Bytes.toBytes("abc"), Bytes.toBytes("xyz")};
-Scan scan = new Scan(row, row); // (optional) limit to one row
-scan.addFamily(family); // (optional) limit to one family
-Filter f = new MultipleColumnPrefixFilter(prefixes);
-scan.setFilter(f);
-scan.setBatch(10); // set this if there could be many columns returned
-ResultScanner rs = t.getScanner(scan);
-for (Result r = rs.next(); r != null; r = rs.next()) {
-  for (KeyValue kv : r.raw()) {
-    // each kv represents a column
-  }
-}
-rs.close();
-</programlisting>
-</para>
-        </section>
-        <section xml:id="client.filter.kvm.crf 
"><title>ColumnRangeFilter</title>
-                       <para>A <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html";>ColumnRangeFilter</link>
 allows efficient intra row scanning.
-            </para>
-                       <para>A ColumnRangeFilter can seek ahead to the first 
matching column for each involved column family. It can be used to efficiently
-                       get a 'slice' of the columns of a very wide row.
-                        i.e. you have a million columns in a row but you only 
want to look at columns bbbb-bbdd.
-            </para>
-            <para>Note: The same column qualifier can be used in different 
column families. This filter returns all matching columns.
-            </para>
-            <para>Example: Find all columns in a row and family between "bbbb" 
(inclusive) and "bbdd" (inclusive)
-<programlisting>
-HTableInterface t = ...;
-byte[] row = ...;
-byte[] family = ...;
-byte[] startColumn = Bytes.toBytes("bbbb");
-byte[] endColumn = Bytes.toBytes("bbdd");
-Scan scan = new Scan(row, row); // (optional) limit to one row
-scan.addFamily(family); // (optional) limit to one family
-Filter f = new ColumnRangeFilter(startColumn, true, endColumn, true);
-scan.setFilter(f);
-scan.setBatch(10); // set this if there could be many columns returned
-ResultScanner rs = t.getScanner(scan);
-for (Result r = rs.next(); r != null; r = rs.next()) {
-  for (KeyValue kv : r.raw()) {
-    // each kv represents a column
-  }
-}
-rs.close();
-</programlisting>
-</para>
-            <para>Note:  Introduced in HBase 0.92</para>
-        </section>
-      </section>
-      <section xml:id="client.filter.row"><title>RowKey</title>
-        <section xml:id="client.filter.row.rf"><title>RowFilter</title>
-          <para>It is generally a better idea to use the startRow/stopRow 
methods on Scan for row selection, however
-          <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html";>RowFilter</link>
 can also be used.</para>
-        </section>
-      </section>
-      <section xml:id="client.filter.utility"><title>Utility</title>
-        <section 
xml:id="client.filter.utility.fkof"><title>FirstKeyOnlyFilter</title>
-          <para>This is primarily used for rowcount jobs.
-          See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html";>FirstKeyOnlyFilter</link>.</para>
-        </section>
-      </section>
-       </section>  <!--  client.filter -->
-
-    <section xml:id="master"><title>Master</title>
-       <para><code>HMaster</code> is the implementation of the Master Server.  
The Master server
-       is responsible for monitoring all RegionServer instances in the 
cluster, and is
-       the interface for all metadata changes.  In a distributed cluster, the 
Master typically runs on the <xref linkend="arch.hdfs.nn" /><footnote>
-            <para>J Mohamed Zahoor goes into some more detail on the Master 
Architecture in this blog posting, <link
-            
xlink:href="http://blog.zahoor.in/2012/08/hbase-hmaster-architecture/";>HBase 
HMaster Architecture
-            </link>.</para>
-          </footnote>
-       </para>
-       <section xml:id="master.startup"><title>Startup Behavior</title>
-         <para>If run in a multi-Master environment, all Masters compete to 
run the cluster.  If the active
-         Master loses its lease in ZooKeeper (or the Master shuts down), then 
then the remaining Masters jostle to
-         take over the Master role.
-         </para>
-       </section>
-       <section xml:id="master.runtime"><title>Runtime Impact</title>
-         <para>A common dist-list question is what happens to an HBase cluster 
when the Master goes down.  Because the
-         HBase client talks directly to the RegionServers, the cluster can 
still function in a "steady
-         state."  Additionally, per <xref linkend="arch.catalog"/> ROOT and 
META exist as HBase tables (i.e., are
-         not resident in the Master).  However, the Master controls critical 
functions such as RegionServer failover and
-         completing region splits.  So while the cluster can still run 
<emphasis>for a time</emphasis> without the Master,
-         the Master should be restarted as soon as possible.
-         </para>
-       </section>
-       <section xml:id="master.api"><title>Interface</title>
-         <para>The methods exposed by <code>HMasterInterface</code> are 
primarily metadata-oriented methods:
-         <itemizedlist>
-            <listitem>Table (createTable, modifyTable, removeTable, enable, 
disable)
-            </listitem>
-            <listitem>ColumnFamily (addColumn, modifyColumn, removeColumn)
-            </listitem>
-            <listitem>Region (move, assign, unassign)
-            </listitem>
-         </itemizedlist>
-         For example, when the <code>HBaseAdmin</code> method 
<code>disableTable</code> is invoked, it is serviced by the Master server.
-         </para>
-       </section>
-       <section xml:id="master.processes"><title>Processes</title>
-         <para>The Master runs several background threads:
-         </para>
-         <section 
xml:id="master.processes.loadbalancer"><title>LoadBalancer</title>
-           <para>Periodically, and when there are no regions in transition,
-             a load balancer will run and move regions around to balance the 
cluster's load.
-             See <xref linkend="balancer_config" /> for configuring this 
property.</para>
-             <para>See <xref linkend="regions.arch.assignment"/> for more 
information on region assignment.
-             </para>
-         </section>
-         <section 
xml:id="master.processes.catalog"><title>CatalogJanitor</title>
-           <para>Periodically checks and cleans up the .META. table.  See 
<xref linkend="arch.catalog.meta" /> for more information on META.</para>
-         </section>
-       </section>
-
-     </section>
-     <section xml:id="regionserver.arch"><title>RegionServer</title>
-       <para><code>HRegionServer</code> is the RegionServer implementation.  
It is responsible for serving and managing regions.
-       In a distributed cluster, a RegionServer runs on a <xref 
linkend="arch.hdfs.dn" />.
-       </para>
-       <section xml:id="regionserver.arch.api"><title>Interface</title>
-         <para>The methods exposed by <code>HRegionRegionInterface</code> 
contain both data-oriented and region-maintenance methods:
-         <itemizedlist>
-            <listitem>Data (get, put, delete, next, etc.)
-            </listitem>
-            <listitem>Region (splitRegion, compactRegion, etc.)
-            </listitem>
-         </itemizedlist>
-         For example, when the <code>HBaseAdmin</code> method 
<code>majorCompact</code> is invoked on a table, the client is actually 
iterating through
-         all regions for the specified table and requesting a major compaction 
directly to each region.
-         </para>
-       </section>
-       <section xml:id="regionserver.arch.processes"><title>Processes</title>
-         <para>The RegionServer runs a variety of background threads:</para>
-         <section 
xml:id="regionserver.arch.processes.compactsplit"><title>CompactSplitThread</title>
-           <para>Checks for splits and handle minor compactions.</para>
-         </section>
-         <section 
xml:id="regionserver.arch.processes.majorcompact"><title>MajorCompactionChecker</title>
-           <para>Checks for major compactions.</para>
-         </section>
-         <section 
xml:id="regionserver.arch.processes.memstore"><title>MemStoreFlusher</title>
-           <para>Periodically flushes in-memory writes in the MemStore to 
StoreFiles.</para>
-         </section>
-         <section 
xml:id="regionserver.arch.processes.log"><title>LogRoller</title>
-           <para>Periodically checks the RegionServer's HLog.</para>
-         </section>
-       </section>
-
-       <section xml:id="coprocessors"><title>Coprocessors</title>
-         <para>Coprocessors were added in 0.92.  There is a thorough <link 
xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction";>Blog 
Overview of CoProcessors</link>
-         posted.  Documentation will eventually move to this reference guide, 
but the blog is the most current information available at this time.
-         </para>
-       </section>
-
-     <section xml:id="block.cache">
-       <title>Block Cache</title>
-       <section xml:id="block.cache.design">
-        <title>Design</title>
-        <para>The Block Cache is an LRU cache that contains three levels of 
block priority to allow for scan-resistance and in-memory ColumnFamilies:
-        </para>
-        <itemizedlist>
-            <listitem>Single access priority: The first time a block is loaded 
from HDFS it normally has this priority and it will be part of the first group 
to be considered
-            during evictions. The advantage is that scanned blocks are more 
likely to get evicted than blocks that are getting more usage.
-            </listitem>
-            <listitem>Mutli access priority: If a block in the previous 
priority group is accessed again, it upgrades to this priority. It is thus part 
of the second group
-            considered during evictions.
-            </listitem>
-            <listitem>In-memory access priority: If the block's family was 
configured to be "in-memory", it will be part of this priority disregarding the 
number of times it
-            was accessed. Catalog tables are configured like this. This group 
is the last one considered during evictions.
-            </listitem>
-        </itemizedlist>
-        <para>
-        For more information, see the <link 
xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/LruBlockCache.html";>LruBlockCache
 source</link>
-        </para>
-       </section>
-     <section xml:id="block.cache.usage">
-        <title>Usage</title>
-        <para>Block caching is enabled by default for all the user tables 
which means that any read operation will load the LRU cache. This might be good 
for a large number of use cases,
-        but further tunings are usually required in order to achieve better 
performance. An important concept is the
-        <link 
xlink:href="http://en.wikipedia.org/wiki/Working_set_size";>working set 
size</link>, or WSS, which is: "the amount of memory needed to compute the 
answer to a problem".
-        For a website, this would be the data that's needed to answer the 
queries over a short amount of time.
-        </para>
-        <para>The way to calculate how much memory is available in HBase for 
caching is:
-        </para>
-        <programlisting>
-            number of region servers * heap size * hfile.block.cache.size * 
0.85
-        </programlisting>
-        <para>The default value for the block cache is 0.25 which represents 
25% of the available heap. The last value (85%) is the default acceptable 
loading factor in the LRU cache after
-        which eviction is started. The reason it is included in this equation 
is that it would be unrealistic to say that it is possible to use 100% of the 
available memory since this would
-        make the process blocking from the point where it loads new blocks. 
Here are some examples:
-        </para>
-        <itemizedlist>
-            <listitem>One region server with the default heap size (1GB) and 
the default block cache size will have 217MB of block cache available.
-            </listitem>
-            <listitem>20 region servers with the heap size set to 8GB and a 
default block cache size will have 34GB of block cache.
-            </listitem>
-            <listitem>100 region servers with the heap size set to 24GB and a 
block cache size of 0.5 will have about 1TB of block cache.
-            </listitem>
-        </itemizedlist>
-        <para>Your data isn't the only resident of the block cache, here are 
others that you may have to take into account:
-        </para>
-        <itemizedlist>
-            <listitem>Catalog tables: The -ROOT- and .META. tables are forced 
into the block cache and have the in-memory priority which means that they are 
harder to evict. The former never uses
-            more than a few hundreds of bytes while the latter can occupy a 
few MBs (depending on the number of regions).
-            </listitem>
-            <listitem>HFiles indexes: HFile is the file format that HBase uses 
to store data in HDFS and it contains a multi-layered index in order seek to 
the data without having to read the whole file.
-            The size of those indexes is a factor of the block size (64KB by 
default), the size of your keys and the amoun


<TRUNCATED>

[12/30] hbase git commit: HBASE-12918 Backport asciidoc changes (Enis Soztutar)

Reply via email to