[13/17] hbase git commit: HBASE-12858 - remove extraneous Docbook files

misty Wed, 14 Jan 2015 20:41:15 -0800

http://git-wip-us.apache.org/repos/asf/hbase/blob/e80b3092/src/main/docbkx/cp.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/cp.xml b/src/main/docbkx/cp.xml
deleted file mode 100644
index 8624309..0000000
--- a/src/main/docbkx/cp.xml
+++ /dev/null
@@ -1,431 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter
-  version="5.0"
-  xml:id="cp"
-  xmlns="http://docbook.org/ns/docbook";
-  xmlns:xlink="http://www.w3.org/1999/xlink";
-  xmlns:xi="http://www.w3.org/2001/XInclude";
-  xmlns:svg="http://www.w3.org/2000/svg";
-  xmlns:m="http://www.w3.org/1998/Math/MathML";
-  xmlns:html="http://www.w3.org/1999/xhtml";
-  xmlns:db="http://docbook.org/ns/docbook";>
-  <!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <title>Apache HBase Coprocessors</title>
-  <para> HBase coprocessors are modeled after the coprocessors which are part 
of Google's BigTable
-      (<link 
xlink:href="http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009"/>, pages
-    66-67.). Coprocessors function in a similar way to Linux kernel modules. 
They provide a way to
-    run server-level code against locally-stored data. The functionality they 
provide is very
-    powerful, but also carries great risk and can have adverse effects on the 
system, at the level
-    of the operating system. The information in this chapter is primarily 
sourced and heavily reused
-    from Mingjie Lai's blog post at <link
-      
xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction"/>. 
</para>
-
-  <para> Coprocessors are not designed to be used by end users of HBase, but 
by HBase developers who
-    need to add specialized functionality to HBase. One example of the use of 
coprocessors is
-    pluggable compaction and scan policies, which are provided as coprocessors 
in <link
-      xlink:href="HBASE-6427">HBASE-6427</link>. </para>
-
-  <section>
-    <title>Coprocessor Framework</title>
-    <para>The implementation of HBase coprocessors diverges from the BigTable 
implementation. The
-      HBase framework provides a library and runtime environment for executing 
user code within the
-      HBase region server and master processes. </para>
-    <para> The framework API is provided in the <link
-        
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html";>coprocessor</link>
-      package.</para>
-    <para>Two different types of coprocessors are provided by the framework, 
based on their
-      scope.</para>
-    <variablelist>
-      <title>Types of Coprocessors</title>
-      <varlistentry>
-        <term>System Coprocessors</term>
-        <listitem>
-          <para>System coprocessors are loaded globally on all tables and 
regions hosted by a region
-            server.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Table Coprocessors</term>
-        <listitem>
-          <para>You can specify which coprocessors should be loaded on all 
regions for a table on a
-            per-table basis.</para>
-        </listitem>
-      </varlistentry>
-    </variablelist>
-
-    <para>The framework provides two different aspects of extensions as well:
-        <firstterm>observers</firstterm> and 
<firstterm>endpoints</firstterm>.</para>
-    <variablelist>
-      <varlistentry>
-        <term>Observers</term>
-        <listitem>
-          <para>Observers are analogous to triggers in conventional databases. 
They allow you to
-            insert user code by overriding upcall methods provided by the 
coprocessor framework.
-            Callback functions are executed from core HBase code when events 
occur. Callbacks are
-            handled by the framework, and the coprocessor itself only needs to 
insert the extended
-            or alternate functionality.</para>
-          <variablelist>
-            <title>Provided Observer Interfaces</title>
-            <varlistentry>
-              <term><link
-                  
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html";>RegionObserver</link></term>
-              <listitem>
-                <para>A RegionObserver provides hooks for data manipulation 
events, such as Get,
-                  Put, Delete, Scan. An instance of a RegionObserver 
coprocessor exists for each
-                  table region. The scope of the observations a RegionObserver 
can make is
-                  constrained to that region. </para>
-              </listitem>
-            </varlistentry>
-            <varlistentry>
-              <term><link
-                  
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html";>RegionServerObserver</link></term>
-              <listitem>
-                <para>A RegionServerObserver provides for operations related 
to the RegionServer,
-                  such as stopping the RegionServer and performing operations 
before or after
-                  merges, commits, or rollbacks.</para>
-              </listitem>
-            </varlistentry>
-            <varlistentry>
-              <term><link
-                  
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html";>WALObserver</link></term>
-              <listitem>
-                <para>A WALObserver provides hooks for operations related to 
the write-ahead log
-                  (WAL). You can observe or intercept WAL writing and 
reconstruction events. A
-                  WALObserver runs in the context of WAL processing. A single 
WALObserver exists on
-                  a single region server.</para>
-              </listitem>
-            </varlistentry>
-            <varlistentry>
-              <term><link
-                  
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html";>MasterObserver</link></term>
-              <listitem>
-                <para>A MasterObserver provides hooks for DDL-type operation, 
such as create,
-                  delete, modify table. The MasterObserver runs within the 
context of the HBase
-                  master. </para>
-              </listitem>
-            </varlistentry>
-          </variablelist>
-          <para>More than one observer of a given type can be loaded at once. 
Multiple observers are
-            chained to execute sequentially by order of assigned priority. 
Nothing prevents a
-            coprocessor implementor from communicating internally among its 
installed
-            observers.</para>
-          <para>An observer of a higher priority can preempt lower-priority 
observers by throwing an
-            IOException or a subclass of IOException.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Endpoints (HBase 0.96.x and later)</term>
-        <listitem>
-          <para>The implementation for endpoints changed significantly in 
HBase 0.96.x due to the
-            introduction of protocol buffers (protobufs) (<link
-              
xlink:href="https://issues.apache.org/jira/browse/HBASE-5448";>HBASE-5488</link>).
 If
-            you created endpoints before 0.96.x, you will need to rewrite 
them. Endpoints are now
-            defined and callable as protobuf services, rather than endpoint 
invocations passed
-            through as Writable blobs</para>
-          <para>Dynamic RPC endpoints resemble stored procedures. An endpoint 
can be invoked at any
-            time from the client. When it is invoked, it is executed remotely 
at the target region
-            or regions, and results of the executions are returned to the 
client.</para>
-          <para>The endpoint implementation is installed on the server and is 
invoked using HBase
-            RPC. The client library provides convenience methods for invoking 
these dynamic
-            interfaces. </para>
-          <para>An endpoint, like an observer, can communicate with any 
installed observers. This
-            allows you to plug new features into HBase without modifying or 
recompiling HBase
-            itself.</para>
-          <itemizedlist>
-            <title>Steps to Implement an Endpoint</title>
-            <listitem><para>Define the coprocessor service and related 
messages in a <filename>.proto</filename> file</para></listitem>
-            <listitem><para>Run the <command>protoc</command> command to 
generate the code.</para></listitem>
-            <listitem><para>Write code to implement the following:</para>
-            <itemizedlist>
-              <listitem><para>the generated protobuf Service 
interface</para></listitem>
-              <listitem>
-                  <para>the new <link
-                      
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorService(byte[])">org.apache.hadoop.hbase.coprocessor.CoprocessorService</link>
-                    interface (required for the <link
-                      
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.html";>RegionCoprocessorHost</link>
-                    to register the exposed service)</para></listitem>
-            </itemizedlist>
-            </listitem>
-            <listitem><para>The client calls the new 
HTable.coprocessorService() methods to perform the endpoint 
RPCs.</para></listitem>
-          </itemizedlist>
-
-          <para>For more information and examples, refer to the API 
documentation for the <link
-            
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/coprocessor/package-summary.html";>coprocessor</link>
-          package, as well as the included RowCount example in the
-            
<filename>/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/</filename>
-            of the HBase source.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Endpoints (HBase 0.94.x and earlier)</term>
-        <listitem>
-          <para>Dynamic RPC endpoints resemble stored procedures. An endpoint 
can be invoked at any
-            time from the client. When it is invoked, it is executed remotely 
at the target region
-            or regions, and results of the executions are returned to the 
client.</para>
-          <para>The endpoint implementation is installed on the server and is 
invoked using HBase
-            RPC. The client library provides convenience methods for invoking 
these dynamic
-            interfaces. </para>
-          <para>An endpoint, like an observer, can communicate with any 
installed observers. This
-            allows you to plug new features into HBase without modifying or 
recompiling HBase
-            itself.</para>
-          <itemizedlist>
-            <title>Steps to Implement an Endpoint</title>
-            <listitem>
-              <bridgehead>Server-Side</bridgehead>
-              <itemizedlist>
-                <listitem>
-                  <para>Create new protocol interface which extends 
CoprocessorProtocol.</para>
-                </listitem>
-                <listitem>
-                  <para>Implement the Endpoint interface. The implementation 
will be loaded into and
-                    executed from the region context.</para>
-                </listitem>
-                <listitem>
-                  <para>Extend the abstract class BaseEndpointCoprocessor. 
This convenience class
-                    hides some internal details that the implementer does not 
need to be concerned
-                    about, Ë such as coprocessor framework class 
loading.</para>
-                </listitem>
-              </itemizedlist>
-            </listitem>
-            <listitem>
-              <bridgehead>Client-Side</bridgehead>
-              <para>Endpoint can be invoked by two new HBase client 
APIs:</para>
-              <itemizedlist>
-                <listitem>
-                  <para><code>HTableInterface.coprocessorProxy(Class&lt;T&gt; 
protocol, byte[]
-                      row)</code> for executing against a single region</para>
-                </listitem>
-                <listitem>
-                  <para><code>HTableInterface.coprocessorExec(Class&lt;T&gt; 
protocol, byte[]
-                      startKey, byte[] endKey, Batch.Call&lt;T,R&gt; 
callable)</code> for executing
-                    over a range of regions</para>
-                </listitem>
-              </itemizedlist>
-            </listitem>
-          </itemizedlist>
-        </listitem>
-      </varlistentry>
-    </variablelist>
-  </section>
-
-  <section>
-    <title>Examples</title>
-    <para>An example of an observer is included in
-      
<filename>hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestZooKeeperScanPolicyObserver.java</filename>.
-    Several endpoint examples are included in the same directory.</para>
-  </section>
- 
-  
-
-  <section>
-    <title>Building A Coprocessor</title>
-
-    <para>Before you can build a processor, it must be developed, compiled, 
and packaged in a JAR
-      file. The next step is to configure the coprocessor framework to use 
your coprocessor. You can
-      load the coprocessor from your HBase configuration, so that the 
coprocessor starts with HBase,
-      or you can configure the coprocessor from the HBase shell, as a table 
attribute, so that it is
-      loaded dynamically when the table is opened or reopened.</para>
-    <section>
-      <title>Load from Configuration</title>
-      <para> To configure a coprocessor to be loaded when HBase starts, modify 
the RegionServer's
-          <filename>hbase-site.xml</filename> and configure one of the 
following properties, based
-        on the type of observer you are configuring: </para>
-      <itemizedlist>
-        <listitem>
-          <para><code>hbase.coprocessor.region.classes</code>for 
RegionObservers and
-            Endpoints</para>
-        </listitem>
-        <listitem>
-          <para><code>hbase.coprocessor.wal.classes</code>for 
WALObservers</para>
-        </listitem>
-        <listitem>
-          <para><code>hbase.coprocessor.master.classes</code>for 
MasterObservers</para>
-        </listitem>
-      </itemizedlist>
-      <example>
-        <title>Example RegionObserver Configuration</title>
-        <para>In this example, one RegionObserver is configured for all the 
HBase tables.</para>
-        <screen language="xml"><![CDATA[
-<property>
-    <name>hbase.coprocessor.region.classes</name>
-    <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
- </property>  ]]>        
-        </screen>
-      </example>
-
-      <para> If multiple classes are specified for loading, the class names 
must be comma-separated.
-        The framework attempts to load all the configured classes using the 
default class loader.
-        Therefore, the jar file must reside on the server-side HBase 
classpath.</para>
-
-      <para>Coprocessors which are loaded in this way will be active on all 
regions of
-        all tables. These are the system coprocessor introduced earlier. The 
first listed
-        coprocessors will be assigned the priority 
<literal>Coprocessor.Priority.SYSTEM</literal>.
-        Each subsequent coprocessor in the list will have its priority value 
incremented by one
-        (which reduces its priority, because priorities have the natural sort 
order of Integers). </para>
-      <para>When calling out to registered observers, the framework executes 
their callbacks methods
-        in the sorted order of their priority. Ties are broken 
arbitrarily.</para>
-    </section>
-
-    <section>
-      <title>Load from the HBase Shell</title>
-      <para> You can load a coprocessor on a specific table via a table 
attribute. The following
-        example will load the <systemitem>FooRegionObserver</systemitem> 
observer when table
-          <systemitem>t1</systemitem> is read or re-read. </para>
-      <example>
-        <title>Load a Coprocessor On a Table Using HBase Shell</title>
-        <screen>
-hbase(main):005:0>  <userinput>alter 't1', METHOD => 'table_att', 
-  
'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'</userinput>
-<computeroutput>Updating all regions with the new schema...
-1/1 regions updated.
-Done.
-0 row(s) in 1.0730 seconds</computeroutput>
-
-hbase(main):006:0> <userinput>describe 't1'</userinput>
-<computeroutput>DESCRIPTION                                                    
    ENABLED                             
- {NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false       
                        
- nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B             
                        
- LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE             
                        
-  => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>              
                        
- '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ             
                        
- E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO             
                        
- CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE',             
                        
-  BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3'             
                        
- , COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647'             
                        
- , KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY              
                        
- => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}                 
                        
-1 row(s) in 0.0190 seconds</computeroutput>
-        </screen>
-      </example>
-
-      <para>The coprocessor framework will try to read the class information 
from the coprocessor
-        table attribute value. The value contains four pieces of information 
which are separated by
-        the <literal>|</literal> character.</para>
-
-      <itemizedlist>
-        <listitem>
-          <para>File path: The jar file containing the coprocessor 
implementation must be in a
-            location where all region servers can read it. You could copy the 
file onto the local
-            disk on each region server, but it is recommended to store it in 
HDFS.</para>
-        </listitem>
-        <listitem>
-          <para>Class name: The full class name of the coprocessor.</para>
-        </listitem>
-        <listitem>
-          <para>Priority: An integer. The framework will determine the 
execution sequence of all
-            configured observers registered at the same hook using priorities. 
This field can be
-            left blank. In that case the framework will assign a default 
priority value.</para>
-        </listitem>
-        <listitem>
-          <para>Arguments: This field is passed to the coprocessor 
implementation.</para>
-        </listitem>
-      </itemizedlist>
-      <example>
-        <title>Unload a Coprocessor From a Table Using HBase Shell</title>
-        <screen>
-hbase(main):007:0> <userinput>alter 't1', METHOD => 
'table_att_unset',</userinput> 
-hbase(main):008:0*   <userinput>NAME => 'coprocessor$1'</userinput>
-<computeroutput>Updating all regions with the new schema...
-1/1 regions updated.
-Done.
-0 row(s) in 1.1130 seconds</computeroutput>
-
-hbase(main):009:0> <userinput>describe 't1'</userinput>
-<computeroutput>DESCRIPTION                                                    
    ENABLED                             
- {NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false       
                        
-  'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION             
                        
- S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214             
                        
- 7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN             
                        
- _MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true             
                        
- '}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>              
                        
- 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>             
                        
-  'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C             
                        
- ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO             
                        
- DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}                                 
                        
-1 row(s) in 0.0180 seconds          </computeroutput>
-        </screen>
-      </example>
-      <warning>
-        <para>There is no guarantee that the framework will load a given 
coprocessor successfully.
-          For example, the shell command neither guarantees a jar file exists 
at a particular
-          location nor verifies whether the given class is actually contained 
in the jar file.
-        </para>
-      </warning>
-    </section>
-  </section>
-  <section>
-    <title>Check the Status of a Coprocessor</title>
-    <para>To check the status of a coprocessor after it has been configured, 
use the
-        <command>status</command> HBase Shell command.</para>
-    <screen>
-hbase(main):020:0> <userinput>status 'detailed'</userinput>
-<computeroutput>version 0.92-tm-6
-0 regionsInTransition
-master coprocessors: []
-1 live servers
-    localhost:52761 1328082515520
-        requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, 
maxHeapMB=995
-        -ROOT-,,0
-            numberOfStores=1, numberOfStorefiles=1, 
storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
-storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, 
rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
-totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, 
compactionProgressPct=NaN, coprocessors=[]
-        .META.,,1
-            numberOfStores=1, numberOfStorefiles=0, 
storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
-storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, 
rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
-totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, 
compactionProgressPct=NaN, coprocessors=[]
-        t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1.
-            numberOfStores=2, numberOfStorefiles=1, 
storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
-storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, 
rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
-totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, 
compactionProgressPct=NaN, 
-coprocessors=[AggregateImplementation]
-0 dead servers      </computeroutput>
-    </screen>
-  </section>
-  <section>
-    <title>Monitor Time Spent in Coprocessors</title>
-    <para>HBase 0.98.5 introduced the ability to monitor some statistics 
relating to the amount of
-      time spent executing a given coprocessor. You can see these statistics 
via the HBase Metrics
-      framework (see <xref linkend="hbase_metrics"/> or the Web UI for a given 
Region Server, via
-      the <guilabel>Coprocessor Metrics</guilabel> tab. These statistics are 
valuable for debugging
-      and benchmarking the performance impact of a given coprocessor on your 
cluster. Tracked
-      statistics include min, max, average, and 90th, 95th, and 99th 
percentile. All times are shown
-      in milliseconds. The statistics are calculated over coprocessor
-      execution samples recorded during the reporting interval, which is 10 
seconds by default. The
-      metrics sampling rate as described in <xref linkend="hbase_metrics" 
/>.</para>
-    <figure>
-      <title>Coprocessor Metrics UI</title>
-      <mediaobject>
-        <imageobject>
-          <imagedata fileref="coprocessor_stats.png" width="100%"/>
-        </imageobject>
-        <caption>
-          <para>The Coprocessor Metrics UI shows statistics about time spent 
executing a given
-            coprocessor, including min, max, average, and 90th, 95th, and 99th 
percentile.</para>
-        </caption>
-      </mediaobject>
-    </figure>
-  </section>
-  <section>
-    <title>Status of Coprocessors in HBase</title>
-    <para> Coprocessors and the coprocessor framework are evolving rapidly and 
work is ongoing on
-      several different JIRAs. </para>
-  </section>
-</chapter>


http://git-wip-us.apache.org/repos/asf/hbase/blob/e80b3092/src/main/docbkx/customization-pdf.xsl
----------------------------------------------------------------------
diff --git a/src/main/docbkx/customization-pdf.xsl 
b/src/main/docbkx/customization-pdf.xsl
deleted file mode 100644
index b21236f..0000000
--- a/src/main/docbkx/customization-pdf.xsl
+++ /dev/null
@@ -1,129 +0,0 @@
-<?xml version="1.0"?>
-<xsl:stylesheet
-  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
-  version="1.0">
-<!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <xsl:import href="urn:docbkx:stylesheet/docbook.xsl"/>
-  <xsl:import href="urn:docbkx:stylesheet/highlight.xsl"/>
-
-
-    <!--###################################################
-                  Paper & Page Size
-   ################################################### -->
-
-    <!-- Paper type, no headers on blank pages, no double sided printing -->
-    <xsl:param name="paper.type" select="'USletter'"/>
-    <xsl:param name="double.sided">0</xsl:param>
-    <xsl:param name="headers.on.blank.pages">0</xsl:param>
-    <xsl:param name="footers.on.blank.pages">0</xsl:param>
-
-    <!-- Space between paper border and content (chaotic stuff, don't touch) 
-->
-    <xsl:param name="page.margin.top">5mm</xsl:param>
-    <xsl:param name="region.before.extent">10mm</xsl:param>
-    <xsl:param name="body.margin.top">10mm</xsl:param>
-
-    <xsl:param name="body.margin.bottom">15mm</xsl:param>
-    <xsl:param name="region.after.extent">10mm</xsl:param>
-    <xsl:param name="page.margin.bottom">0mm</xsl:param>
-
-    <xsl:param name="page.margin.outer">18mm</xsl:param>
-    <xsl:param name="page.margin.inner">18mm</xsl:param>
-
-    <!-- No intendation of Titles -->
-    <xsl:param name="title.margin.left">0pc</xsl:param>
-
-    <!--###################################################
-                  Fonts & Styles
-   ################################################### -->
-
-    <!-- Left aligned text and no hyphenation -->
-    <xsl:param name="alignment">justify</xsl:param>
-    <xsl:param name="hyphenate">true</xsl:param>
-
-    <!-- Default Font size -->
-    <xsl:param name="body.font.master">11</xsl:param>
-    <xsl:param name="body.font.small">8</xsl:param>
-
-    <!-- Line height in body text -->
-    <xsl:param name="line-height">1.4</xsl:param>
-
-    <!-- Force line break in long URLs -->
-    <xsl:param name="ulink.hyphenate.chars">/&amp;?</xsl:param>
-       <xsl:param name="ulink.hyphenate">&#x200B;</xsl:param>
-
-    <!-- Monospaced fonts are smaller than regular text -->
-    <xsl:attribute-set name="monospace.properties">
-        <xsl:attribute name="font-family">
-            <xsl:value-of select="$monospace.font.family"/>
-        </xsl:attribute>
-        <xsl:attribute name="font-size">0.8em</xsl:attribute>
-        <xsl:attribute name="wrap-option">wrap</xsl:attribute>
-        <xsl:attribute name="hyphenate">true</xsl:attribute>
-    </xsl:attribute-set>
-
-
-       <!-- add page break after abstract block -->
-       <xsl:attribute-set name="abstract.properties">
-               <xsl:attribute name="break-after">page</xsl:attribute>
-       </xsl:attribute-set>
-
-       <!-- add page break after toc -->
-       <xsl:attribute-set name="toc.margin.properties">
-               <xsl:attribute name="break-after">page</xsl:attribute>
-       </xsl:attribute-set>
-
-       <!-- add page break after first level sections -->
-       <xsl:attribute-set name="section.level1.properties">
-               <xsl:attribute name="break-after">page</xsl:attribute>
-       </xsl:attribute-set>
-
-    <!-- Show only Sections up to level 3 in the TOCs -->
-    <xsl:param name="toc.section.depth">2</xsl:param>
-
-    <!-- Dot and Whitespace as separator in TOC between Label and Title-->
-    <xsl:param name="autotoc.label.separator" select="'.  '"/>
-
-       <!-- program listings / examples formatting -->
-       <xsl:attribute-set name="monospace.verbatim.properties">
-               <xsl:attribute name="font-family">Courier</xsl:attribute>
-               <xsl:attribute name="font-size">8pt</xsl:attribute>
-               <xsl:attribute 
name="keep-together.within-column">always</xsl:attribute>
-       </xsl:attribute-set>
-
-       <xsl:param name="shade.verbatim" select="1" />
-
-       <xsl:attribute-set name="shade.verbatim.style">
-               <xsl:attribute name="background-color">#E8E8E8</xsl:attribute>
-               <xsl:attribute name="border-width">0.5pt</xsl:attribute>
-               <xsl:attribute name="border-style">solid</xsl:attribute>
-               <xsl:attribute name="border-color">#575757</xsl:attribute>
-               <xsl:attribute name="padding">3pt</xsl:attribute>
-       </xsl:attribute-set>
-
-       <!-- callouts customization -->
-       <xsl:param name="callout.unicode" select="1" />
-       <xsl:param name="callout.graphics" select="0" />
-    <xsl:param name="callout.defaultcolumn">90</xsl:param>     
-
-    <!-- Syntax Highlighting -->
-
-
-</xsl:stylesheet>

http://git-wip-us.apache.org/repos/asf/hbase/blob/e80b3092/src/main/docbkx/customization.xsl
----------------------------------------------------------------------
diff --git a/src/main/docbkx/customization.xsl 
b/src/main/docbkx/customization.xsl
deleted file mode 100644
index 5d0ec2c..0000000
--- a/src/main/docbkx/customization.xsl
+++ /dev/null
@@ -1,49 +0,0 @@
-<?xml version="1.0"?>
-<xsl:stylesheet
-  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
-  version="1.0">
-<!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <xsl:import href="urn:docbkx:stylesheet"/>
-  <xsl:import href="urn:docbkx:stylesheet/highlight.xsl"/>
-  <xsl:output method="html" encoding="UTF-8" indent="no"/>
-
-  <xsl:template name="user.header.content">
-    <script type="text/javascript">
-    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
-    var disqus_url = 'http://hbase.apache.org/book/<xsl:value-of 
select="@xml:id" />.html';
-    <!--var disqus_identifier = '<xsl:value-of select="@xml:id" 
/>';--></script>
-  </xsl:template>
-
-  <xsl:template name="user.footer.content">
-<div id="disqus_thread"></div>
-<script type="text/javascript">
-    /* * * DON'T EDIT BELOW THIS LINE * * */
-    (function() {
-        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
-        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
-        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
-    })();
-</script>
-<noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript>
-<a href="http://disqus.com"; class="dsq-brlink">comments powered by <span 
class="logo-disqus">Disqus</span></a>
-  </xsl:template>
-
-</xsl:stylesheet>

http://git-wip-us.apache.org/repos/asf/hbase/blob/e80b3092/src/main/docbkx/datamodel.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/datamodel.xml b/src/main/docbkx/datamodel.xml
deleted file mode 100644
index bdf697d..0000000
--- a/src/main/docbkx/datamodel.xml
+++ /dev/null
@@ -1,865 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter
-    xml:id="datamodel"
-    version="5.0"
-    xmlns="http://docbook.org/ns/docbook";
-    xmlns:xlink="http://www.w3.org/1999/xlink";
-    xmlns:xi="http://www.w3.org/2001/XInclude";
-    xmlns:svg="http://www.w3.org/2000/svg";
-    xmlns:m="http://www.w3.org/1998/Math/MathML";
-    xmlns:html="http://www.w3.org/1999/xhtml";
-    xmlns:db="http://docbook.org/ns/docbook";>
-    <!--/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-
-    <title>Data Model</title>
-    <para>In HBase, data is stored in tables, which have rows and columns. 
This is a terminology
-      overlap with relational databases (RDBMSs), but this is not a helpful 
analogy. Instead, it can
-    be helpful to think of an HBase table as a multi-dimensional map.</para>
-    <variablelist>
-      <title>HBase Data Model Terminology</title>
-      <varlistentry>
-        <term>Table</term>
-        <listitem>
-          <para>An HBase table consists of multiple rows.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Row</term>
-        <listitem>
-          <para>A row in HBase consists of a row key and one or more columns 
with values associated
-            with them. Rows are sorted alphabetically by the row key as they 
are stored. For this
-            reason, the design of the row key is very important. The goal is 
to store data in such a
-            way that related rows are near each other. A common row key 
pattern is a website domain.
-            If your row keys are domains, you should probably store them in 
reverse (org.apache.www,
-            org.apache.mail, org.apache.jira). This way, all of the Apache 
domains are near each
-            other in the table, rather than being spread out based on the 
first letter of the
-            subdomain.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Column</term>
-        <listitem>
-          <para>A column in HBase consists of a column family and a column 
qualifier, which are
-            delimited by a <literal>:</literal> (colon) character.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Column Family</term>
-        <listitem>
-          <para>Column families physically colocate a set of columns and their 
values, often for
-            performance reasons. Each column family has a set of storage 
properties, such as whether
-            its values should be cached in memory, how its data is compressed 
or its row keys are
-            encoded, and others. Each row in a table has the same column
-            families, though a given row might not store anything in a given 
column family.</para>
-          <para>Column families are specified when you create your table, and 
influence the way your
-            data is stored in the underlying filesystem. Therefore, the column 
families should be
-            considered carefully during schema design.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Column Qualifier</term>
-        <listitem>
-          <para>A column qualifier is added to a column family to provide the 
index for a given
-            piece of data. Given a column family <literal>content</literal>, a 
column qualifier
-            might be <literal>content:html</literal>, and another might be
-            <literal>content:pdf</literal>. Though column families are fixed 
at table creation,
-            column qualifiers are mutable and may differ greatly between 
rows.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Cell</term>
-        <listitem>
-          <para>A cell is a combination of row, column family, and column 
qualifier, and contains a
-            value and a timestamp, which represents the value's version.</para>
-          <para>A cell's value is an uninterpreted array of bytes.</para>
-        </listitem>
-      </varlistentry>
-      <varlistentry>
-        <term>Timestamp</term>
-        <listitem>
-          <para>A timestamp is written alongside each value, and is the 
identifier for a given
-            version of a value. By default, the timestamp represents the time 
on the RegionServer
-            when the data was written, but you can specify a different 
timestamp value when you put
-            data into the cell.</para>
-          <caution>
-            <para>Direct manipulation of timestamps is an advanced feature 
which is only exposed for
-              special cases that are deeply integrated with HBase, and is 
discouraged in general.
-              Encoding a timestamp at the application level is the preferred 
pattern.</para>
-          </caution>
-          <para>You can specify the maximum number of versions of a value that 
HBase retains, per column
-            family. When the maximum number of versions is reached, the oldest 
versions are 
-            eventually deleted. By default, only the newest version is 
kept.</para>
-        </listitem>
-      </varlistentry>
-    </variablelist>
-
-    <section
-      xml:id="conceptual.view">
-      <title>Conceptual View</title>
-      <para>You can read a very understandable explanation of the HBase data 
model in the blog post <link
-          
xlink:href="http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable";>Understanding
-          HBase and BigTable</link> by Jim R. Wilson. Another good explanation 
is available in the
-        PDF <link
-          
xlink:href="http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf";>Introduction
-          to Basic Schema Design</link> by Amandeep Khurana. It may help to 
read different
-        perspectives to get a solid understanding of HBase schema design. The 
linked articles cover
-        the same ground as the information in this section.</para>
-      <para> The following example is a slightly modified form of the one on 
page 2 of the <link
-          
xlink:href="http://research.google.com/archive/bigtable.html";>BigTable</link> 
paper. There
-        is a table called <varname>webtable</varname> that contains two rows
-        (<literal>com.cnn.www</literal>
-          and <literal>com.example.www</literal>), three column families named
-          <varname>contents</varname>, <varname>anchor</varname>, and 
<varname>people</varname>. In
-          this example, for the first row (<literal>com.cnn.www</literal>), 
-          <varname>anchor</varname> contains two columns 
(<varname>anchor:cssnsi.com</varname>,
-          <varname>anchor:my.look.ca</varname>) and 
<varname>contents</varname> contains one column
-          (<varname>contents:html</varname>). This example contains 5 versions 
of the row with the
-        row key <literal>com.cnn.www</literal>, and one version of the row 
with the row key
-        <literal>com.example.www</literal>. The 
<varname>contents:html</varname> column qualifier contains the entire
-        HTML of a given website. Qualifiers of the <varname>anchor</varname> 
column family each
-        contain the external site which links to the site represented by the 
row, along with the
-        text it used in the anchor of its link. The <varname>people</varname> 
column family represents
-        people associated with the site.
-      </para>
-        <note>
-          <title>Column Names</title>
-        <para> By convention, a column name is made of its column family 
prefix and a
-            <emphasis>qualifier</emphasis>. For example, the column
-            <emphasis>contents:html</emphasis> is made up of the column family
-            <varname>contents</varname> and the <varname>html</varname> 
qualifier. The colon
-          character (<literal>:</literal>) delimits the column family from the 
column family
-            <emphasis>qualifier</emphasis>. </para>
-        </note>
-        <table
-          frame="all">
-          <title>Table <varname>webtable</varname></title>
-          <tgroup
-            cols="5"
-            align="left"
-            colsep="1"
-            rowsep="1">
-            <colspec
-              colname="c1" />
-            <colspec
-              colname="c2" />
-            <colspec
-              colname="c3" />
-            <colspec
-              colname="c4" />
-            <colspec
-              colname="c5" />
-            <thead>
-              <row>
-                <entry>Row Key</entry>
-                <entry>Time Stamp</entry>
-                <entry>ColumnFamily <varname>contents</varname></entry>
-                <entry>ColumnFamily <varname>anchor</varname></entry>
-                <entry>ColumnFamily <varname>people</varname></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry>"com.cnn.www"</entry>
-                <entry>t9</entry>
-                <entry />
-                <entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry>
-                <entry />
-              </row>
-              <row>
-                <entry>"com.cnn.www"</entry>
-                <entry>t8</entry>
-                <entry />
-                <entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry>
-                <entry />
-              </row>
-              <row>
-                <entry>"com.cnn.www"</entry>
-                <entry>t6</entry>
-                <entry><varname>contents:html</varname> = 
"&lt;html&gt;..."</entry>
-                <entry />
-                <entry />
-              </row>
-              <row>
-                <entry>"com.cnn.www"</entry>
-                <entry>t5</entry>
-                <entry><varname>contents:html</varname> = 
"&lt;html&gt;..."</entry>
-                <entry />
-                <entry />
-              </row>
-              <row>
-                <entry>"com.cnn.www"</entry>
-                <entry>t3</entry>
-                <entry><varname>contents:html</varname> = 
"&lt;html&gt;..."</entry>
-                <entry />
-                <entry />
-              </row>
-              <row>
-                <entry>"com.example.www"</entry>
-                <entry>t5</entry>
-                <entry><varname>contents:html</varname> = 
"&lt;html&gt;..."</entry>
-                <entry></entry>
-                <entry>people:author = "John Doe"</entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </table>
-      <para>Cells in this table that appear to be empty do not take space, or 
in fact exist, in
-        HBase. This is what makes HBase "sparse." A tabular view is not the 
only possible way to
-        look at data in HBase, or even the most accurate. The following 
represents the same
-        information as a multi-dimensional map. This is only a mock-up for 
illustrative
-        purposes and may not be strictly accurate.</para>
-      <programlisting><![CDATA[
-{
-       "com.cnn.www": {
-               contents: {
-                       t6: contents:html: "<html>..."
-                       t5: contents:html: "<html>..."
-                       t3: contents:html: "<html>..."
-               }
-               anchor: {
-                       t9: anchor:cnnsi.com = "CNN"
-                       t8: anchor:my.look.ca = "CNN.com"
-               }
-               people: {}
-       }
-       "com.example.www": {
-               contents: {
-                       t5: contents:html: "<html>..."
-               }
-               anchor: {}
-               people: {
-                       t5: people:author: "John Doe"
-               }
-       }
-}        
-        ]]></programlisting>
-
-    </section>
-    <section
-      xml:id="physical.view">
-      <title>Physical View</title>
-      <para> Although at a conceptual level tables may be viewed as a sparse 
set of rows, they are
-        physically stored by column family. A new column qualifier 
(column_family:column_qualifier)
-        can be added to an existing column family at any time.</para>
-      <table
-        frame="all">
-        <title>ColumnFamily <varname>anchor</varname></title>
-        <tgroup
-          cols="3"
-          align="left"
-          colsep="1"
-          rowsep="1">
-          <colspec
-            colname="c1" />
-          <colspec
-            colname="c2" />
-          <colspec
-            colname="c3" />
-          <thead>
-            <row>
-              <entry>Row Key</entry>
-              <entry>Time Stamp</entry>
-              <entry>Column Family <varname>anchor</varname></entry>
-            </row>
-          </thead>
-          <tbody>
-            <row>
-              <entry>"com.cnn.www"</entry>
-              <entry>t9</entry>
-              <entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry>
-            </row>
-            <row>
-              <entry>"com.cnn.www"</entry>
-              <entry>t8</entry>
-              <entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry>
-            </row>
-          </tbody>
-        </tgroup>
-      </table>
-      <table
-        frame="all">
-        <title>ColumnFamily <varname>contents</varname></title>
-        <tgroup
-          cols="3"
-          align="left"
-          colsep="1"
-          rowsep="1">
-          <colspec
-            colname="c1" />
-          <colspec
-            colname="c2" />
-          <colspec
-            colname="c3" />
-          <thead>
-            <row>
-              <entry>Row Key</entry>
-              <entry>Time Stamp</entry>
-              <entry>ColumnFamily "contents:"</entry>
-            </row>
-          </thead>
-          <tbody>
-            <row>
-              <entry>"com.cnn.www"</entry>
-              <entry>t6</entry>
-              <entry><varname>contents:html</varname> = 
"&lt;html&gt;..."</entry>
-            </row>
-            <row>
-              <entry>"com.cnn.www"</entry>
-              <entry>t5</entry>
-              <entry><varname>contents:html</varname> = 
"&lt;html&gt;..."</entry>
-            </row>
-            <row>
-              <entry>"com.cnn.www"</entry>
-              <entry>t3</entry>
-              <entry><varname>contents:html</varname> = 
"&lt;html&gt;..."</entry>
-            </row>
-          </tbody>
-        </tgroup>
-      </table>
-      <para>The empty cells shown in the
-        conceptual view are not stored at all.
-        Thus a request for the value of the <varname>contents:html</varname> 
column at time stamp
-          <literal>t8</literal> would return no value. Similarly, a request 
for an
-          <varname>anchor:my.look.ca</varname> value at time stamp 
<literal>t9</literal> would
-        return no value. However, if no timestamp is supplied, the most recent 
value for a
-        particular column would be returned. Given multiple versions, the most 
recent is also the
-        first one found,  since timestamps
-        are stored in descending order. Thus a request for the values of all 
columns in the row
-          <varname>com.cnn.www</varname> if no timestamp is specified would 
be: the value of
-          <varname>contents:html</varname> from timestamp 
<literal>t6</literal>, the value of
-          <varname>anchor:cnnsi.com</varname> from timestamp 
<literal>t9</literal>, the value of
-          <varname>anchor:my.look.ca</varname> from timestamp 
<literal>t8</literal>. </para>
-      <para>For more information about the internals of how Apache HBase 
stores data, see <xref
-          linkend="regions.arch" />. </para>
-    </section>
-
-    <section
-      xml:id="namespace">
-      <title>Namespace</title>
-      <para> A namespace is a logical grouping of tables analogous to a 
database in relation
-        database systems. This abstraction lays the groundwork for upcoming 
multi-tenancy related
-        features: <itemizedlist>
-          <listitem>
-            <para>Quota Management (HBASE-8410) - Restrict the amount of 
resources (ie regions,
-              tables) a namespace can consume.</para>
-          </listitem>
-          <listitem>
-            <para>Namespace Security Administration (HBASE-9206) - provide 
another level of security
-              administration for tenants.</para>
-          </listitem>
-          <listitem>
-            <para>Region server groups (HBASE-6721) - A namespace/table can be 
pinned onto a subset
-              of regionservers thus guaranteeing a course level of 
isolation.</para>
-          </listitem>
-        </itemizedlist>
-      </para>
-      <section
-        xml:id="namespace_creation">
-        <title>Namespace management</title>
-        <para> A namespace can be created, removed or altered. Namespace 
membership is determined
-          during table creation by specifying a fully-qualified table name of 
the form:</para>
-
-        <programlisting language="xml"><![CDATA[<table namespace>:<table 
qualifier>]]></programlisting>
-
-
-        <example>
-          <title>Examples</title>
-
-          <programlisting language="bourne">
-#Create a namespace
-create_namespace 'my_ns'
-            </programlisting>
-          <programlisting language="bourne">
-#create my_table in my_ns namespace
-create 'my_ns:my_table', 'fam'
-          </programlisting>
-          <programlisting language="bourne">
-#drop namespace
-drop_namespace 'my_ns'
-          </programlisting>
-          <programlisting language="bourne">
-#alter namespace
-alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
-        </programlisting>
-        </example>
-      </section>
-      <section
-        xml:id="namespace_special">
-        <title>Predefined namespaces</title>
-        <para> There are two predefined special namespaces: </para>
-        <itemizedlist>
-          <listitem>
-            <para>hbase - system namespace, used to contain hbase internal 
tables</para>
-          </listitem>
-          <listitem>
-            <para>default - tables with no explicit specified namespace will 
automatically fall into
-              this namespace.</para>
-          </listitem>
-        </itemizedlist>
-        <example>
-          <title>Examples</title>
-
-          <programlisting language="bourne">
-#namespace=foo and table qualifier=bar
-create 'foo:bar', 'fam'
-
-#namespace=default and table qualifier=bar
-create 'bar', 'fam'
-</programlisting>
-        </example>
-      </section>
-    </section>
-
-    <section
-      xml:id="table">
-      <title>Table</title>
-      <para> Tables are declared up front at schema definition time. </para>
-    </section>
-
-    <section
-      xml:id="row">
-      <title>Row</title>
-      <para>Row keys are uninterrpreted bytes. Rows are lexicographically 
sorted with the lowest
-        order appearing first in a table. The empty byte array is used to 
denote both the start and
-        end of a tables' namespace.</para>
-    </section>
-
-    <section
-      xml:id="columnfamily">
-      <title>Column Family<indexterm><primary>Column 
Family</primary></indexterm></title>
-      <para> Columns in Apache HBase are grouped into <emphasis>column 
families</emphasis>. All
-        column members of a column family have the same prefix. For example, 
the columns
-          <emphasis>courses:history</emphasis> and 
<emphasis>courses:math</emphasis> are both
-        members of the <emphasis>courses</emphasis> column family. The colon 
character
-          (<literal>:</literal>) delimits the column family from the 
<indexterm><primary>column
-            family qualifier</primary><secondary>Column Family 
Qualifier</secondary></indexterm>.
-        The column family prefix must be composed of 
<emphasis>printable</emphasis> characters. The
-        qualifying tail, the column family <emphasis>qualifier</emphasis>, can 
be made of any
-        arbitrary bytes. Column families must be declared up front at schema 
definition time whereas
-        columns do not need to be defined at schema time but can be conjured 
on the fly while the
-        table is up an running.</para>
-      <para>Physically, all column family members are stored together on the 
filesystem. Because
-        tunings and storage specifications are done at the column family 
level, it is advised that
-        all column family members have the same general access pattern and size
-        characteristics.</para>
-
-    </section>
-    <section
-      xml:id="cells">
-      <title>Cells<indexterm><primary>Cells</primary></indexterm></title>
-      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly 
specifies a
-          <literal>cell</literal> in HBase. Cell content is uninterrpreted 
bytes</para>
-    </section>
-    <section
-      xml:id="data_model_operations">
-      <title>Data Model Operations</title>
-      <para>The four primary data model operations are Get, Put, Scan, and 
Delete. Operations are
-        applied via <link
-          
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html";>Table</link>
-        instances.
-      </para>
-      <section
-        xml:id="get">
-        <title>Get</title>
-        <para><link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
-          returns attributes for a specified row. Gets are executed via <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)">
-            Table.get</link>. </para>
-      </section>
-      <section
-        xml:id="put">
-        <title>Put</title>
-        <para><link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html";>Put</link>
-          either adds new rows to a table (if the key is new) or can update 
existing rows (if the
-          key already exists). Puts are executed via <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)">
-            Table.put</link> (writeBuffer) or <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,
 java.lang.Object[])">
-            Table.batch</link> (non-writeBuffer). </para>
-      </section>
-      <section
-        xml:id="scan">
-        <title>Scans</title>
-        <para><link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
-          allow iteration over multiple rows for specified attributes. </para>
-        <para>The following is an example of a Scan on a Table instance. 
Assume that a table is
-          populated with rows with keys "row1", "row2", "row3", and then 
another set of rows with
-          the keys "abc1", "abc2", and "abc3". The following example shows how 
to set a Scan
-          instance to return the rows beginning with "row".</para>
-<programlisting language="java">
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-
-Table table = ...      // instantiate a Table instance
-
-Scan scan = new Scan();
-scan.addColumn(CF, ATTR);
-scan.setRowPrefixFilter(Bytes.toBytes("row"));
-ResultScanner rs = table.getScanner(scan);
-try {
-  for (Result r = rs.next(); r != null; r = rs.next()) {
-  // process result...
-} finally {
-  rs.close();  // always close the ResultScanner!
-</programlisting>
-        <para>Note that generally the easiest way to specify a specific stop 
point for a scan is by
-          using the <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html";>InclusiveStopFilter</link>
-          class. </para>
-      </section>
-      <section
-        xml:id="delete">
-        <title>Delete</title>
-        <para><link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html";>Delete</link>
-          removes a row from a table. Deletes are executed via <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)">
-            HTable.delete</link>. </para>
-        <para>HBase does not modify data in place, and so deletes are handled 
by creating new
-          markers called <emphasis>tombstones</emphasis>. These tombstones, 
along with the dead
-          values, are cleaned up on major compactions. </para>
-        <para>See <xref
-            linkend="version.delete" /> for more information on deleting 
versions of columns, and
-          see <xref
-            linkend="compaction" /> for more information on compactions. 
</para>
-
-      </section>
-
-    </section>
-
-
-    <section
-      xml:id="versions">
-      <title>Versions<indexterm><primary>Versions</primary></indexterm></title>
-
-      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly 
specifies a
-          <literal>cell</literal> in HBase. It's possible to have an unbounded 
number of cells where
-        the row and column are the same but the cell address differs only in 
its version
-        dimension.</para>
-
-      <para>While rows and column keys are expressed as bytes, the version is 
specified using a long
-        integer. Typically this long contains time instances such as those 
returned by
-          <code>java.util.Date.getTime()</code> or 
<code>System.currentTimeMillis()</code>, that is:
-          <quote>the difference, measured in milliseconds, between the current 
time and midnight,
-          January 1, 1970 UTC</quote>.</para>
-
-      <para>The HBase version dimension is stored in decreasing order, so that 
when reading from a
-        store file, the most recent values are found first.</para>
-
-      <para>There is a lot of confusion over the semantics of 
<literal>cell</literal> versions, in
-        HBase. In particular:</para>
-      <itemizedlist>
-        <listitem>
-          <para>If multiple writes to a cell have the same version, only the 
last written is
-            fetchable.</para>
-        </listitem>
-
-        <listitem>
-          <para>It is OK to write cells in a non-increasing version 
order.</para>
-        </listitem>
-      </itemizedlist>
-
-      <para>Below we describe how the version dimension in HBase currently 
works. See <link
-              
xlink:href="https://issues.apache.org/jira/browse/HBASE-2406";>HBASE-2406</link> 
for
-            discussion of HBase versions. <link
-              xlink:href="http://outerthought.org/blog/417-ot.html";>Bending 
time in HBase</link>
-            makes for a good read on the version, or time, dimension in HBase. 
It has more detail on
-            versioning than is provided here. As of this writing, the 
limiitation
-              <emphasis>Overwriting values at existing timestamps</emphasis> 
mentioned in the
-            article no longer holds in HBase. This section is basically a 
synopsis of this article
-            by Bruno Dumon.</para>
-      
-      <section xml:id="specify.number.of.versions">
-        <title>Specifying the Number of Versions to Store</title>
-        <para>The maximum number of versions to store for a given column is 
part of the column
-          schema and is specified at table creation, or via an 
<command>alter</command> command, via
-            <code>HColumnDescriptor.DEFAULT_VERSIONS</code>. Prior to HBase 
0.96, the default number
-          of versions kept was <literal>3</literal>, but in 0.96 and newer has 
been changed to
-            <literal>1</literal>.</para>
-        <example>
-          <title>Modify the Maximum Number of Versions for a Column</title>
-          <para>This example uses HBase Shell to keep a maximum of 5 versions 
of column
-              <code>f1</code>. You could also use <link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";
-              >HColumnDescriptor</link>.</para>
-          <screen><![CDATA[hbase> alter ât1â², NAME => âf1â², VERSIONS 
=> 5]]></screen>
-        </example>
-        <example>
-          <title>Modify the Minimum Number of Versions for a Column</title>
-          <para>You can also specify the minimum number of versions to store. 
By default, this is
-            set to 0, which means the feature is disabled. The following 
example sets the minimum
-            number of versions on field <code>f1</code> to 
<literal>2</literal>, via HBase Shell.
-            You could also use <link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";
-              >HColumnDescriptor</link>.</para>
-          <screen><![CDATA[hbase> alter ât1â², NAME => âf1â², 
MIN_VERSIONS => 2]]></screen>
-        </example>
-        <para>Starting with HBase 0.98.2, you can specify a global default for 
the maximum number of
-          versions kept for all newly-created columns, by setting
-            <option>hbase.column.max.version</option> in 
<filename>hbase-site.xml</filename>. See
-            <xref linkend="hbase.column.max.version"/>.</para>
-      </section>
-
-      <section
-        xml:id="versions.ops">
-        <title>Versions and HBase Operations</title>
-
-        <para>In this section we look at the behavior of the version dimension 
for each of the core
-          HBase operations.</para>
-
-        <section>
-          <title>Get/Scan</title>
-
-          <para>Gets are implemented on top of Scans. The below discussion of 
<link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
-            applies equally to <link
-              
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scans</link>.</para>
-
-          <para>By default, i.e. if you specify no explicit version, when 
doing a
-              <literal>get</literal>, the cell whose version has the largest 
value is returned
-            (which may or may not be the latest one written, see later). The 
default behavior can be
-            modified in the following ways:</para>
-
-          <itemizedlist>
-            <listitem>
-              <para>to return more than one version, see <link
-                  
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para>
-            </listitem>
-
-            <listitem>
-              <para>to return versions other than the latest, see <link
-                  xlink:href="???">Get.setTimeRange()</link></para>
-
-              <para>To retrieve the latest version that is less than or equal 
to a given value, thus
-                giving the 'latest' state of the record at a certain point in 
time, just use a range
-                from 0 to the desired version and set the max versions to 
1.</para>
-            </listitem>
-          </itemizedlist>
-
-        </section>
-        <section
-          xml:id="default_get_example">
-          <title>Default Get Example</title>
-          <para>The following Get will only retrieve the current version of 
the row</para>
-          <programlisting language="java">
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Get get = new Get(Bytes.toBytes("row1"));
-Result r = table.get(get);
-byte[] b = r.getValue(CF, ATTR);  // returns current version of value
-</programlisting>
-        </section>
-        <section
-          xml:id="versioned_get_example">
-          <title>Versioned Get Example</title>
-          <para>The following Get will return the last 3 versions of the 
row.</para>
-          <programlisting language="java">
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Get get = new Get(Bytes.toBytes("row1"));
-get.setMaxVersions(3);  // will return last 3 versions of row
-Result r = table.get(get);
-byte[] b = r.getValue(CF, ATTR);  // returns current version of value
-List&lt;KeyValue&gt; kv = r.getColumn(CF, ATTR);  // returns all versions of 
this column
-</programlisting>
-        </section>
-
-        <section>
-          <title>Put</title>
-
-          <para>Doing a put always creates a new version of a 
<literal>cell</literal>, at a certain
-            timestamp. By default the system uses the server's 
<literal>currentTimeMillis</literal>,
-            but you can specify the version (= the long integer) yourself, on 
a per-column level.
-            This means you could assign a time in the past or the future, or 
use the long value for
-            non-time purposes.</para>
-
-          <para>To overwrite an existing value, do a put at exactly the same 
row, column, and
-            version as that of the cell you would overshadow.</para>
-          <section
-            xml:id="implicit_version_example">
-            <title>Implicit Version Example</title>
-            <para>The following Put will be implicitly versioned by HBase with 
the current
-              time.</para>
-            <programlisting language="java">
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Put put = new Put(Bytes.toBytes(row));
-put.add(CF, ATTR, Bytes.toBytes( data));
-table.put(put);
-</programlisting>
-          </section>
-          <section
-            xml:id="explicit_version_example">
-            <title>Explicit Version Example</title>
-            <para>The following Put has the version timestamp explicitly 
set.</para>
-            <programlisting language="java">
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Put put = new Put( Bytes.toBytes(row));
-long explicitTimeInMs = 555;  // just an example
-put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
-table.put(put);
-</programlisting>
-            <para>Caution: the version timestamp is internally by HBase for 
things like time-to-live
-              calculations. It's usually best to avoid setting this timestamp 
yourself. Prefer using
-              a separate timestamp attribute of the row, or have the timestamp 
a part of the rowkey,
-              or both. </para>
-          </section>
-
-        </section>
-
-        <section
-          xml:id="version.delete">
-          <title>Delete</title>
-
-          <para>There are three different types of internal delete markers. 
See Lars Hofhansl's blog
-            for discussion of his attempt adding another, <link
-              
xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html";>Scanning
-              in HBase: Prefix Delete Marker</link>. </para>
-          <itemizedlist>
-            <listitem>
-              <para>Delete: for a specific version of a column.</para>
-            </listitem>
-            <listitem>
-              <para>Delete column: for all versions of a column.</para>
-            </listitem>
-            <listitem>
-              <para>Delete family: for all columns of a particular 
ColumnFamily</para>
-            </listitem>
-          </itemizedlist>
-          <para>When deleting an entire row, HBase will internally create a 
tombstone for each
-            ColumnFamily (i.e., not each individual column). </para>
-          <para>Deletes work by creating <emphasis>tombstone</emphasis> 
markers. For example, let's
-            suppose we want to delete a row. For this you can specify a 
version, or else by default
-            the <literal>currentTimeMillis</literal> is used. What this means 
is <quote>delete all
-              cells where the version is less than or equal to this 
version</quote>. HBase never
-            modifies data in place, so for example a delete will not 
immediately delete (or mark as
-            deleted) the entries in the storage file that correspond to the 
delete condition.
-            Rather, a so-called <emphasis>tombstone</emphasis> is written, 
which will mask the
-            deleted values. When HBase does a major compaction, the tombstones 
are processed to
-            actually remove the dead values, together with the tombstones 
themselves. If the version
-            you specified when deleting a row is larger than the version of 
any value in the row,
-            then you can consider the complete row to be deleted.</para>
-          <para>For an informative discussion on how deletes and versioning 
interact, see the thread <link
-              
xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421";>Put
 w/
-              timestamp -> Deleteall -> Put w/ timestamp fails</link> up on 
the user mailing
-            list.</para>
-          <para>Also see <xref
-              linkend="keyvalue" /> for more information on the internal 
KeyValue format. </para>
-          <para>Delete markers are purged during the next major compaction of 
the store, unless the
-              <option>KEEP_DELETED_CELLS</option> option is set in the column 
family. To keep the
-            deletes for a configurable amount of time, you can set the delete 
TTL via the
-              <option>hbase.hstore.time.to.purge.deletes</option> property in
-              <filename>hbase-site.xml</filename>. If
-              <option>hbase.hstore.time.to.purge.deletes</option> is not set, 
or set to 0, all
-            delete markers, including those with timestamps in the future, are 
purged during the
-            next major compaction. Otherwise, a delete marker with a timestamp 
in the future is kept
-            until the major compaction which occurs after the time represented 
by the marker's
-            timestamp plus the value of 
<option>hbase.hstore.time.to.purge.deletes</option>, in
-            milliseconds. </para>
-          <note>
-            <para>This behavior represents a fix for an unexpected change that 
was introduced in
-              HBase 0.94, and was fixed in <link
-                
xlink:href="https://issues.apache.org/jira/browse/HBASE-10118";>HBASE-10118</link>.
-              The change has been backported to HBase 0.94 and newer 
branches.</para>
-          </note>
-        </section>
-      </section>
-
-      <section>
-        <title>Current Limitations</title>
-
-        <section>
-          <title>Deletes mask Puts</title>
-
-          <para>Deletes mask puts, even puts that happened after the delete
-          was entered. See <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-2256";
-              >HBASE-2256</link>. Remember that a delete writes a tombstone, 
which only
-          disappears after then next major compaction has run. Suppose you do
-          a delete of everything &lt;= T. After this you do a new put with a
-          timestamp &lt;= T. This put, even if it happened after the delete,
-          will be masked by the delete tombstone. Performing the put will not
-          fail, but when you do a get you will notice the put did have no
-          effect. It will start working again after the major compaction has
-          run. These issues should not be a problem if you use
-          always-increasing versions for new puts to a row. But they can occur
-          even if you do not care about time: just do delete and put
-          immediately after each other, and there is some chance they happen
-          within the same millisecond.</para>
-        </section>
-
-        <section
-          xml:id="major.compactions.change.query.results">
-          <title>Major compactions change query results</title>
-          
-          <para><quote>...create three cell versions at t1, t2 and t3, with a 
maximum-versions
-              setting of 2. So when getting all versions, only the values at 
t2 and t3 will be
-              returned. But if you delete the version at t2 or t3, the one at 
t1 will appear again.
-              Obviously, once a major compaction has run, such behavior will 
not be the case
-              anymore...</quote> (See <emphasis>Garbage Collection</emphasis> 
in <link
-              xlink:href="http://outerthought.org/blog/417-ot.html";>Bending 
time in
-            HBase</link>.)</para>
-        </section>
-      </section>
-    </section>
-    <section xml:id="dm.sort">
-      <title>Sort Order</title>
-      <para>All data model operations HBase return data in sorted order.  
First by row,
-      then by ColumnFamily, followed by column qualifier, and finally 
timestamp (sorted
-      in reverse, so newest records are returned first).
-      </para>
-    </section>
-    <section xml:id="dm.column.metadata">
-      <title>Column Metadata</title>
-      <para>There is no store of column metadata outside of the internal 
KeyValue instances for a ColumnFamily.
-      Thus, while HBase can support not only a wide number of columns per row, 
but a heterogenous set of columns
-      between rows as well, it is your responsibility to keep track of the 
column names.
-      </para>
-      <para>The only way to get a complete set of columns that exist for a 
ColumnFamily is to process all the rows.
-      For more information about how HBase stores data internally, see <xref 
linkend="keyvalue" />.
-         </para>
-    </section>
-    <section xml:id="joins"><title>Joins</title>
-      <para>Whether HBase supports joins is a common question on the 
dist-list, and there is a simple answer:  it doesn't,
-      at not least in the way that RDBMS' support them (e.g., with equi-joins 
or outer-joins in SQL).  As has been illustrated
-      in this chapter, the read data model operations in HBase are Get and 
Scan.
-      </para>
-      <para>However, that doesn't mean that equivalent join functionality 
can't be supported in your application, but
-      you have to do it yourself.  The two primary strategies are either 
denormalizing the data upon writing to HBase,
-      or to have lookup tables and do the join between HBase tables in your 
application or MapReduce code (and as RDBMS'
-      demonstrate, there are several strategies for this depending on the size 
of the tables, e.g., nested loops vs.
-      hash-joins).  So which is the best approach?  It depends on what you are 
trying to do, and as such there isn't a single
-      answer that works for every use case.
-      </para>
-    </section>
-    <section xml:id="acid"><title>ACID</title>
-        <para>See <link 
xlink:href="http://hbase.apache.org/acid-semantics.html";>ACID Semantics</link>.
-            Lars Hofhansl has also written a note on
-            <link 
xlink:href="http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html";>ACID 
in HBase</link>.</para>
-    </section>
-  </chapter>

[13/17] hbase git commit: HBASE-12858 - remove extraneous Docbook files

Reply via email to