http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/performance.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
deleted file mode 100644
index 50de8aa..0000000
--- a/src/main/docbkx/performance.xml
+++ /dev/null
@@ -1,775 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter version="5.0" xml:id="performance"
-         xmlns="http://docbook.org/ns/docbook";
-         xmlns:xlink="http://www.w3.org/1999/xlink";
-         xmlns:xi="http://www.w3.org/2001/XInclude";
-         xmlns:svg="http://www.w3.org/2000/svg";
-         xmlns:m="http://www.w3.org/1998/Math/MathML";
-         xmlns:html="http://www.w3.org/1999/xhtml";
-         xmlns:db="http://docbook.org/ns/docbook";>
-<!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <title>Apache HBase Performance Tuning</title>
-
-  <section xml:id="perf.os">
-    <title>Operating System</title>
-        <section xml:id="perf.os.ram">
-          <title>Memory</title>
-          <para>RAM, RAM, RAM.  Don't starve HBase.</para>
-        </section>
-        <section xml:id="perf.os.64">
-          <title>64-bit</title>
-          <para>Use a 64-bit platform (and 64-bit JVM).</para>
-        </section>
-        <section xml:id="perf.os.swap">
-          <title>Swapping</title>
-          <para>Watch out for swapping.  Set swappiness to 0.</para>
-        </section>
-  </section>
-  <section xml:id="perf.network">
-    <title>Network</title>
-    <para>
-    Perhaps the most important factor in avoiding network issues degrading 
Hadoop and HBbase performance is the switching hardware
-    that is used, decisions made early in the scope of the project can cause 
major problems when you double or triple the size of your cluster (or more).
-    </para>
-    <para>
-    Important items to consider:
-        <itemizedlist>
-          <listitem>Switching capacity of the device</listitem>
-          <listitem>Number of systems connected</listitem>
-          <listitem>Uplink capacity</listitem>
-        </itemizedlist>
-    </para>
-    <section xml:id="perf.network.1switch">
-      <title>Single Switch</title>
-      <para>The single most important factor in this configuration is that the 
switching capacity of the hardware is capable of
-      handling the traffic which can be generated by all systems connected to 
the switch. Some lower priced commodity hardware
-      can have a slower switching capacity than could be utilized by a full 
switch.
-      </para>
-    </section>
-    <section xml:id="perf.network.2switch">
-      <title>Multiple Switches</title>
-      <para>Multiple switches are a potential pitfall in the architecture.   
The most common configuration of lower priced hardware is a
-      simple 1Gbps uplink from one switch to another. This often overlooked 
pinch point can easily become a bottleneck for cluster communication.
-      Especially with MapReduce jobs that are both reading and writing a lot 
of data the communication across this uplink could be saturated.
-      </para>
-      <para>Mitigation of this issue is fairly simple and can be accomplished 
in multiple ways:
-      <itemizedlist>
-        <listitem>Use appropriate hardware for the scale of the cluster which 
you're attempting to build.</listitem>
-        <listitem>Use larger single switch configurations i.e. single 48 port 
as opposed to 2x 24 port</listitem>
-        <listitem>Configure port trunking for uplinks to utilize multiple 
interfaces to increase cross switch bandwidth.</listitem>
-      </itemizedlist>
-      </para>
-    </section>
-    <section xml:id="perf.network.multirack">
-      <title>Multiple Racks</title>
-      <para>Multiple rack configurations carry the same potential issues as 
multiple switches, and can suffer performance degradation from two main areas:
-         <itemizedlist>
-           <listitem>Poor switch capacity performance</listitem>
-           <listitem>Insufficient uplink to another rack</listitem>
-         </itemizedlist>
-      If the the switches in your rack have appropriate switching capacity to 
handle all the hosts at full speed, the next most likely issue will be caused 
by homing
-      more of your cluster across racks.  The easiest way to avoid issues when 
spanning multiple racks is to use port trunking to create a bonded uplink to 
other racks.
-      The downside of this method however, is in the overhead of ports that 
could potentially be used. An example of this is, creating an 8Gbps port 
channel from rack
-      A to rack B, using 8 of your 24 ports to communicate between racks gives 
you a poor ROI, using too few however can mean you're not getting the most out 
of your cluster.
-      </para>
-      <para>Using 10Gbe links between racks will greatly increase performance, 
and assuming your switches support a 10Gbe uplink or allow for an expansion 
card will allow you to
-      save your ports for machines as opposed to uplinks.
-      </para>
-    </section>
-    <section xml:id="perf.network.ints">
-      <title>Network Interfaces</title>
-      <para>Are all the network interfaces functioning correctly?  Are you 
sure?  See the Troubleshooting Case Study in <xref 
linkend="casestudies.slownode"/>.
-      </para>
-    </section>
-  </section>  <!-- network -->
-
-  <section xml:id="jvm">
-    <title>Java</title>
-
-    <section xml:id="gc">
-      <title>The Garbage Collector and Apache HBase</title>
-
-      <section xml:id="gcpause">
-        <title>Long GC pauses</title>
-
-        <para xml:id="mslab">In his presentation, <link
-        
xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation";>Avoiding
-        Full GCs with MemStore-Local Allocation Buffers</link>, Todd Lipcon
-        describes two cases of stop-the-world garbage collections common in
-        HBase, especially during loading; CMS failure modes and old generation
-        heap fragmentation brought. To address the first, start the CMS
-        earlier than default by adding
-        <code>-XX:CMSInitiatingOccupancyFraction</code> and setting it down
-        from defaults. Start at 60 or 70 percent (The lower you bring down the
-        threshold, the more GCing is done, the more CPU used). To address the
-        second fragmentation issue, Todd added an experimental facility,
-        <indexterm><primary>MSLAB</primary></indexterm>, that
-        must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be 
on in
-        Apache 0.92.x HBase). See 
<code>hbase.hregion.memstore.mslab.enabled</code>
-        to true in your <classname>Configuration</classname>. See the cited
-        slides for background and detail<footnote><para>The latest jvms do 
better
-        regards fragmentation so make sure you are running a recent release.
-        Read down in the message,
-        <link 
xlink:href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html";>Identifying
 concurrent mode failures caused by fragmentation</link>.</para></footnote>.
-        Be aware that when enabled, each MemStore instance will occupy at least
-        an MSLAB instance of memory.  If you have thousands of regions or lots
-        of regions each with many column families, this allocation of MSLAB
-        may be responsible for a good portion of your heap allocation and in
-        an extreme case cause you to OOME.  Disable MSLAB in this case, or
-        lower the amount of memory it uses or float less regions per server.
-        </para>
-        <para>If you have a write-heavy workload, check out
-            <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-8163";>HBASE-8163 
MemStoreChunkPool: An improvement for JAVA GC when using MSLAB</link>.
-            It describes configurations to lower the amount of young GC during 
write-heavy loadings.  If you do not have HBASE-8163 installed, and you are
-            trying to improve your young GC times, one trick to consider -- 
courtesy of our Liang Xie -- is to set the GC config 
<varname>-XX:PretenureSizeThreshold</varname> in 
<filename>hbase-env.sh</filename>
-            to be just smaller than the size of 
<varname>hbase.hregion.memstore.mslab.chunksize</varname> so MSLAB allocations 
happen in the
-            tenured space directly rather than first in the young gen.  You'd 
do this because these MSLAB allocations are going to likely make it
-            to the old gen anyways and rather than pay the price of a copies 
between s0 and s1 in eden space followed by the copy up from
-            young to old gen after the MSLABs have achieved sufficient tenure, 
save a bit of YGC churn and allocate in the old gen directly.
-            </para>
-        <para>For more information about GC logs, see <xref 
linkend="trouble.log.gc" />.
-        </para>
-      </section>
-    </section>
-  </section>
-
-  <section xml:id="perf.configurations">
-    <title>HBase Configurations</title>
-
-    <para>See <xref linkend="recommended_configurations" />.</para>
-
-
-    <section xml:id="perf.number.of.regions">
-      <title>Number of Regions</title>
-
-      <para>The number of regions for an HBase table is driven by the <xref
-              linkend="bigger.regions" />. Also, see the architecture
-          section on <xref linkend="arch.regions.size" /></para>
-    </section>
-
-    <section xml:id="perf.compactions.and.splits">
-      <title>Managing Compactions</title>
-
-      <para>For larger systems, managing <link
-      linkend="disable.splitting">compactions and splits</link> may be
-      something you want to consider.</para>
-    </section>
-
-    <section xml:id="perf.handlers">
-        <title><varname>hbase.regionserver.handler.count</varname></title>
-        <para>See <xref linkend="hbase.regionserver.handler.count"/>.
-           </para>
-    </section>
-    <section xml:id="perf.hfile.block.cache.size">
-        <title><varname>hfile.block.cache.size</varname></title>
-        <para>See <xref linkend="hfile.block.cache.size"/>.
-        A memory setting for the RegionServer process.
-        </para>
-    </section>
-    <section xml:id="perf.rs.memstore.upperlimit">
-        
<title><varname>hbase.regionserver.global.memstore.upperLimit</varname></title>
-        <para>See <xref 
linkend="hbase.regionserver.global.memstore.upperLimit"/>.
-        This memory setting is often adjusted for the RegionServer process 
depending on needs.
-        </para>
-    </section>
-    <section xml:id="perf.rs.memstore.lowerlimit">
-        
<title><varname>hbase.regionserver.global.memstore.lowerLimit</varname></title>
-        <para>See <xref 
linkend="hbase.regionserver.global.memstore.lowerLimit"/>.
-        This memory setting is often adjusted for the RegionServer process 
depending on needs.
-        </para>
-    </section>
-    <section xml:id="perf.hstore.blockingstorefiles">
-        <title><varname>hbase.hstore.blockingStoreFiles</varname></title>
-        <para>See <xref linkend="hbase.hstore.blockingStoreFiles"/>.
-        If there is blocking in the RegionServer logs, increasing this can 
help.
-        </para>
-    </section>
-    <section xml:id="perf.hregion.memstore.block.multiplier">
-        
<title><varname>hbase.hregion.memstore.block.multiplier</varname></title>
-        <para>See <xref linkend="hbase.hregion.memstore.block.multiplier"/>.
-        If there is enough RAM, increasing this can help.
-        </para>
-    </section>
-    <section xml:id="hbase.regionserver.checksum.verify">
-        <title><varname>hbase.regionserver.checksum.verify</varname></title>
-        <para>Have HBase write the checksum into the datablock and save
-        having to do the checksum seek whenever you read.</para>
-
-        <para>See <xref linkend="hbase.regionserver.checksum.verify"/>,
-        <xref linkend="hbase.hstore.bytes.per.checksum"/> and <xref 
linkend="hbase.hstore.checksum.algorithm"/>
-        For more information see the
-        release note on <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-5074";>HBASE-5074 
support checksums in HBase block cache</link>.
-        </para>
-    </section>
-
-  </section>
-
-
-
-
-  <section xml:id="perf.zookeeper">
-    <title>ZooKeeper</title>
-    <para>See <xref linkend="zookeeper"/> for information on configuring 
ZooKeeper, and see the part
-    about having a dedicated disk.
-    </para>
-  </section>
-  <section xml:id="perf.schema">
-      <title>Schema Design</title>
-
-    <section xml:id="perf.number.of.cfs">
-      <title>Number of Column Families</title>
-      <para>See <xref linkend="number.of.cfs" />.</para>
-    </section>
-    <section xml:id="perf.schema.keys">
-      <title>Key and Attribute Lengths</title>
-      <para>See <xref linkend="keysize" />.  See also <xref 
linkend="perf.compression.however" /> for
-      compression caveats.</para>
-    </section>
-    <section xml:id="schema.regionsize"><title>Table RegionSize</title>
-    <para>The regionsize can be set on a per-table basis via 
<code>setFileSize</code> on
-    <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html";>HTableDescriptor</link>
 in the
-    event where certain tables require different regionsizes than the 
configured default regionsize.
-    </para>
-    <para>See <xref linkend="perf.number.of.regions"/> for more information.
-    </para>
-    </section>
-    <section xml:id="schema.bloom">
-    <title>Bloom Filters</title>
-    <para>Bloom Filters can be enabled per-ColumnFamily.
-        Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
-        ROWCOL)</code> to enable blooms per Column Family. Default =
-        <varname>NONE</varname> for no bloom filters. If
-        <varname>ROW</varname>, the hash of the row will be added to the bloom
-        on each insert. If <varname>ROWCOL</varname>, the hash of the row +
-        column family + column family qualifier will be added to the bloom on
-        each key insert.</para>
-    <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>
 and
-    <xref linkend="blooms"/> for more information or this answer up in quora,
-<link 
xlink:href="http://www.quora.com/How-are-bloom-filters-used-in-HBase";>How are 
bloom filters used in HBase?</link>.
-    </para>
-    </section>
-    <section xml:id="schema.cf.blocksize"><title>ColumnFamily BlockSize</title>
-    <para>The blocksize can be configured for each ColumnFamily in a table, 
and this defaults to 64k.  Larger cell values require larger blocksizes.
-    There is an inverse relationship between blocksize and the resulting 
StoreFile indexes (i.e., if the blocksize is doubled then the resulting
-    indexes should be roughly halved).
-    </para>
-    <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>
-    and <xref linkend="store"/>for more information.
-    </para>
-    </section>
-    <section xml:id="cf.in.memory">
-    <title>In-Memory ColumnFamilies</title>
-    <para>ColumnFamilies can optionally be defined as in-memory.  Data is 
still persisted to disk, just like any other ColumnFamily.
-    In-memory blocks have the highest priority in the <xref 
linkend="block.cache" />, but it is not a guarantee that the entire table
-    will be in memory.
-    </para>
-    <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>
 for more information.
-    </para>
-    </section>
-    <section xml:id="perf.compression">
-      <title>Compression</title>
-      <para>Production systems should use compression with their ColumnFamily 
definitions.  See <xref linkend="compression" /> for more information.
-      </para>
-      <section xml:id="perf.compression.however"><title>However...</title>
-         <para>Compression deflates data <emphasis>on disk</emphasis>.  When 
it's in-memory (e.g., in the
-         MemStore) or on the wire (e.g., transferring between RegionServer and 
Client) it's inflated.
-         So while using ColumnFamily compression is a best practice, but it's 
not going to completely eliminate
-         the impact of over-sized Keys, over-sized ColumnFamily names, or 
over-sized Column names.
-         </para>
-         <para>See <xref linkend="keysize" /> on for schema design tips, and 
<xref linkend="keyvalue"/> for more information on HBase stores data internally.
-         </para>
-      </section>
-    </section>
-  </section>  <!--  perf schema -->
-
-  <section xml:id="perf.general">
-    <title>HBase General Patterns</title>
-    <section xml:id="perf.general.constants">
-      <title>Constants</title>
-      <para>When people get started with HBase they have a tendency to write 
code that looks like this:
-<programlisting>
-Get get = new Get(rowkey);
-Result r = htable.get(get);
-byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns 
current version of value
-</programlisting>
-               But especially when inside loops (and MapReduce jobs), 
converting the columnFamily and column-names
-               to byte-arrays repeatedly is surprisingly expensive.
-               It's better to use constants for the byte-arrays, like this:
-<programlisting>
-public static final byte[] CF = "cf".getBytes();
-public static final byte[] ATTR = "attr".getBytes();
-...
-Get get = new Get(rowkey);
-Result r = htable.get(get);
-byte[] b = r.getValue(CF, ATTR);  // returns current version of value
-</programlisting>
-      </para>
-    </section>
-
-  </section>
-  <section xml:id="perf.writing">
-    <title>Writing to HBase</title>
-
-    <section xml:id="perf.batch.loading">
-      <title>Batch Loading</title>
-      <para>Use the bulk load tool if you can.  See
-        <xref linkend="arch.bulk.load"/>.
-        Otherwise, pay attention to the below.
-      </para>
-    </section>  <!-- batch loading -->
-
-    <section xml:id="precreate.regions">
-    <title>
-    Table Creation: Pre-Creating Regions
-    </title>
-<para>
-Tables in HBase are initially created with one region by default.  For bulk 
imports, this means that all clients will write to the same region
-until it is large enough to split and become distributed across the cluster.  
A useful pattern to speed up the bulk import process is to pre-create empty 
regions.
- Be somewhat conservative in this, because too-many regions can actually 
degrade performance.
-</para>
-       <para>There are two different approaches to pre-creating splits.  The 
first approach is to rely on the default <code>HBaseAdmin</code> strategy
-       (which is implemented in <code>Bytes.split</code>)...
-       </para>
-<programlisting>
-byte[] startKey = ...;         // your lowest keuy
-byte[] endKey = ...;                   // your highest key
-int numberOfRegions = ...;     // # of regions to create
-admin.createTable(table, startKey, endKey, numberOfRegions);
-</programlisting>
-       <para>And the other approach is to define the splits yourself...
-       </para>
-<programlisting>
-byte[][] splits = ...;   // create your own splits
-admin.createTable(table, splits);
-</programlisting>
-<para>
-   See <xref linkend="rowkey.regionsplits"/> for issues related to 
understanding your keyspace and pre-creating regions.
-  </para>
-  </section>
-    <section xml:id="def.log.flush">
-    <title>
-    Table Creation: Deferred Log Flush
-    </title>
-<para>
-The default behavior for Puts using the Write Ahead Log (WAL) is that 
<classname>HLog</classname> edits will be written immediately.  If deferred log 
flush is used,
-WAL edits are kept in memory until the flush period.  The benefit is 
aggregated and asynchronous <classname>HLog</classname>- writes, but the 
potential downside is that if
- the RegionServer goes down the yet-to-be-flushed edits are lost.  This is 
safer, however, than not using WAL at all with Puts.
-</para>
-<para>
-Deferred log flush can be configured on tables via <link
-      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html";>HTableDescriptor</link>.
  The default value of 
<varname>hbase.regionserver.optionallogflushinterval</varname> is 1000ms.
-</para>
-    </section>
-
-    <section xml:id="perf.hbase.client.autoflush">
-      <title>HBase Client:  AutoFlush</title>
-
-      <para>When performing a lot of Puts, make sure that setAutoFlush is set
-      to false on your <link
-      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html";>HTable</link>
-      instance. Otherwise, the Puts will be sent one at a time to the
-      RegionServer. Puts added via <code> htable.add(Put)</code> and <code> 
htable.add( &lt;List&gt; Put)</code>
-      wind up in the same write buffer. If <code>autoFlush = false</code>,
-      these messages are not sent until the write-buffer is filled. To
-      explicitly flush the messages, call 
<methodname>flushCommits</methodname>.
-      Calling <methodname>close</methodname> on the 
<classname>HTable</classname>
-      instance will invoke <methodname>flushCommits</methodname>.</para>
-    </section>
-    <section xml:id="perf.hbase.client.putwal">
-      <title>HBase Client:  Turn off WAL on Puts</title>
-      <para>A frequently discussed option for increasing throughput on 
<classname>Put</classname>s is to call <code>writeToWAL(false)</code>.  Turning 
this off means
-          that the RegionServer will <emphasis>not</emphasis> write the 
<classname>Put</classname> to the Write Ahead Log,
-          only into the memstore, HOWEVER the consequence is that if there
-          is a RegionServer failure <emphasis>there will be data 
loss</emphasis>.
-          If <code>writeToWAL(false)</code> is used, do so with extreme 
caution.  You may find in actuality that
-          it makes little difference if your load is well distributed across 
the cluster.
-      </para>
-      <para>In general, it is best to use WAL for Puts, and where loading 
throughput
-          is a concern to use <link linkend="perf.batch.loading">bulk 
loading</link> techniques instead.
-      </para>
-    </section>
-    <section xml:id="perf.hbase.client.regiongroup">
-      <title>HBase Client: Group Puts by RegionServer</title>
-      <para>In addition to using the writeBuffer, grouping 
<classname>Put</classname>s by RegionServer can reduce the number of client RPC 
calls per writeBuffer flush.
-      There is a utility <classname>HTableUtil</classname> currently on TRUNK 
that does this, but you can either copy that or implement your own verison for
-      those still on 0.90.x or earlier.
-      </para>
-    </section>
-    <section xml:id="perf.hbase.write.mr.reducer">
-      <title>MapReduce:  Skip The Reducer</title>
-      <para>When writing a lot of data to an HBase table from a MR job (e.g., 
with <link
-      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html";>TableOutputFormat</link>),
 and specifically where Puts are being emitted
-      from the Mapper, skip the Reducer step.  When a Reducer step is used, 
all of the output (Puts) from the Mapper will get spooled to disk, then 
sorted/shuffled to other
-      Reducers that will most likely be off-node.  It's far more efficient to 
just write directly to HBase.
-      </para>
-      <para>For summary jobs where HBase is used as a source and a sink, then 
writes will be coming from the Reducer step (e.g., summarize values then write 
out result).
-      This is a different processing problem than from the the above case.
-      </para>
-    </section>
-
-  <section xml:id="perf.one.region">
-    <title>Anti-Pattern:  One Hot Region</title>
-    <para>If all your data is being written to one region at a time, then 
re-read the
-    section on processing <link linkend="timeseries">timeseries</link> 
data.</para>
-    <para>Also, if you are pre-splitting regions and all your data is 
<emphasis>still</emphasis> winding up in a single region even though
-    your keys aren't monotonically increasing, confirm that your keyspace 
actually works with the split strategy.  There are a
-    variety of reasons that regions may appear "well split" but won't work 
with your data.   As
-    the HBase client communicates directly with the RegionServers, this can be 
obtained via
-    <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[]%29";>HTable.getRegionLocation</link>.
-    </para>
-    <para>See <xref linkend="precreate.regions"/>, as well as <xref 
linkend="perf.configurations"/> </para>
-  </section>
-
-  </section>  <!--  writing -->
-
-  <section xml:id="perf.reading">
-    <title>Reading from HBase</title>
-
-    <section xml:id="perf.hbase.client.caching">
-      <title>Scan Caching</title>
-
-      <para>If HBase is used as an input source for a MapReduce job, for
-      example, make sure that the input <link
-      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
-      instance to the MapReduce job has <methodname>setCaching</methodname> 
set to something greater
-      than the default (which is 1). Using the default value means that the
-      map-task will make call back to the region-server for every record
-      processed. Setting this value to 500, for example, will transfer 500
-      rows at a time to the client to be processed. There is a cost/benefit to
-      have the cache value be large because it costs more in memory for both
-      client and RegionServer, so bigger isn't always better.</para>
-      <section xml:id="perf.hbase.client.caching.mr">
-        <title>Scan Caching in MapReduce Jobs</title>
-        <para>Scan settings in MapReduce jobs deserve special attention.  
Timeouts can result (e.g., UnknownScannerException)
-        in Map tasks if it takes longer to process a batch of records before 
the client goes back to the RegionServer for the
-        next set of data.  This problem can occur because there is non-trivial 
processing occuring per row.  If you process
-        rows quickly, set caching higher.  If you process rows more slowly 
(e.g., lots of transformations per row, writes),
-        then set caching lower.
-        </para>
-        <para>Timeouts can also happen in a non-MapReduce use case (i.e., 
single threaded HBase client doing a Scan), but the
-        processing that is often performed in MapReduce jobs tends to 
exacerbate this issue.
-        </para>
-      </section>
-    </section>
-    <section xml:id="perf.hbase.client.selection">
-      <title>Scan Attribute Selection</title>
-
-      <para>Whenever a Scan is used to process large numbers of rows (and 
especially when used
-      as a MapReduce source), be aware of which attributes are selected.   If 
<code>scan.addFamily</code> is called
-      then <emphasis>all</emphasis> of the attributes in the specified 
ColumnFamily will be returned to the client.
-      If only a small number of the available attributes are to be processed, 
then only those attributes should be specified
-      in the input scan because attribute over-selection is a non-trivial 
performance penalty over large datasets.
-      </para>
-    </section>
-    <section xml:id="perf.hbase.client.seek">
-      <title>Avoid scan seeks</title>
-      <para>When columns are selected explicitly with 
<code>scan.addColumn</code>, HBase will schedule seek operations to seek 
between the
-      selected columns. When rows have few columns and each column has only a 
few versions this can be inefficient. A seek operation is generally
-      slower if does not seek at least past 5-10 columns/versions or 512-1024 
bytes.</para>
-      <para>In order to opportunistically look ahead a few columns/versions to 
see if the next column/version can be found that
-      way before a seek operation is scheduled, a new attribute 
<code>Scan.HINT_LOOKAHEAD</code> can be set the on Scan object. The following 
code instructs the
-      RegionServer to attempt two iterations of next before a seek is 
scheduled:<programlisting>
-Scan scan = new Scan();
-scan.addColumn(...);
-scan.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
-table.getScanner(scan);
-</programlisting></para>
-       </section>
-    <section xml:id="perf.hbase.mr.input">
-        <title>MapReduce - Input Splits</title>
-        <para>For MapReduce jobs that use HBase tables as a source, if there a 
pattern where the "slow" map tasks seem to
-        have the same Input Split (i.e., the RegionServer serving the data), 
see the
-        Troubleshooting Case Study in <xref linkend="casestudies.slownode"/>.
-        </para>
-    </section>
-
-    <section xml:id="perf.hbase.client.scannerclose">
-      <title>Close ResultScanners</title>
-
-      <para>This isn't so much about improving performance but rather
-      <emphasis>avoiding</emphasis> performance problems. If you forget to
-      close <link
-      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html";>ResultScanners</link>
-      you can cause problems on the RegionServers. Always have ResultScanner
-      processing enclosed in try/catch blocks... <programlisting>
-Scan scan = new Scan();
-// set attrs...
-ResultScanner rs = htable.getScanner(scan);
-try {
-  for (Result r = rs.next(); r != null; r = rs.next()) {
-  // process result...
-} finally {
-  rs.close();  // always close the ResultScanner!
-}
-htable.close();</programlisting></para>
-    </section>
-
-    <section xml:id="perf.hbase.client.blockcache">
-      <title>Block Cache</title>
-
-      <para><link
-      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
-      instances can be set to use the block cache in the RegionServer via the
-      <methodname>setCacheBlocks</methodname> method. For input Scans to 
MapReduce jobs, this should be
-      <varname>false</varname>. For frequently accessed rows, it is advisable 
to use the block
-      cache.</para>
-    </section>
-    <section xml:id="perf.hbase.client.rowkeyonly">
-      <title>Optimal Loading of Row Keys</title>
-      <para>When performing a table <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>scan</link>
-            where only the row keys are needed (no families, qualifiers, 
values or timestamps), add a FilterList with a
-            <varname>MUST_PASS_ALL</varname> operator to the scanner using 
<methodname>setFilter</methodname>. The filter list
-            should include both a <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html";>FirstKeyOnlyFilter</link>
-            and a <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html";>KeyOnlyFilter</link>.
-            Using this filter combination will result in a worst case scenario 
of a RegionServer reading a single value from disk
-            and minimal network traffic to the client for a single row.
-      </para>
-    </section>
-   <section xml:id="perf.hbase.read.dist">
-      <title>Concurrency:  Monitor Data Spread</title>
-      <para>When performing a high number of concurrent reads, monitor the 
data spread of the target tables.  If the target table(s) have
-      too few regions then the reads could likely be served from too few 
nodes.  </para>
-      <para>See <xref linkend="precreate.regions"/>, as well as <xref 
linkend="perf.configurations"/> </para>
-   </section>
-     <section xml:id="blooms">
-     <title>Bloom Filters</title>
-         <para>Enabling Bloom Filters can save your having to go to disk and
-         can help improve read latencys.</para>
-         <para><link 
xlink:href="http://en.wikipedia.org/wiki/Bloom_filter";>Bloom filters</link> 
were developed over in <link
-    xlink:href="https://issues.apache.org/jira/browse/HBASE-1200";>HBase-1200
-    Add bloomfilters</link>.<footnote>
-        <para>For description of the development process -- why static blooms
-        rather than dynamic -- and for an overview of the unique properties
-        that pertain to blooms in HBase, as well as possible future
-        directions, see the <emphasis>Development Process</emphasis> section
-        of the document <link
-        
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf";>BloomFilters
-        in HBase</link> attached to <link
-        
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200";>HBase-1200</link>.</para>
-      </footnote><footnote>
-        <para>The bloom filters described here are actually version two of
-        blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
-        option based on work done by the <link
-        xlink:href="http://www.one-lab.org";>European Commission One-Lab
-        Project 034819</link>. The core of the HBase bloom work was later
-        pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
-        Version 1 of HBase blooms never worked that well. Version 2 is a
-        rewrite from scratch though again it starts with the one-lab
-        work.</para>
-      </footnote></para>
-        <para>See also <xref linkend="schema.bloom" />.
-        </para>
-
-     <section xml:id="bloom_footprint">
-      <title>Bloom StoreFile footprint</title>
-
-      <para>Bloom filters add an entry to the <classname>StoreFile</classname>
-      general <classname>FileInfo</classname> data structure and then two
-      extra entries to the <classname>StoreFile</classname> metadata
-      section.</para>
-
-      <section>
-        <title>BloomFilter in the <classname>StoreFile</classname>
-        <classname>FileInfo</classname> data structure</title>
-
-          <para><classname>FileInfo</classname> has a
-          <varname>BLOOM_FILTER_TYPE</varname> entry which is set to
-          <varname>NONE</varname>, <varname>ROW</varname> or
-          <varname>ROWCOL.</varname></para>
-      </section>
-
-      <section>
-        <title>BloomFilter entries in <classname>StoreFile</classname>
-        metadata</title>
-
-          <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
-          Function used, etc. Its small in size and is cached on
-          <classname>StoreFile.Reader</classname> load</para>
-          <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
-          data. Obtained on-demand. Stored in the LRU cache, if it is enabled
-          (Its enabled by default).</para>
-      </section>
-     </section>
-         <section xml:id="config.bloom">
-           <title>Bloom Filter Configuration</title>
-        <section>
-        <title><varname>io.hfile.bloom.enabled</varname> global kill
-        switch</title>
-
-        <para><code>io.hfile.bloom.enabled</code> in
-        <classname>Configuration</classname> serves as the kill switch in case
-        something goes wrong. Default = <varname>true</varname>.</para>
-        </section>
-
-        <section>
-        <title><varname>io.hfile.bloom.error.rate</varname></title>
-
-        <para><varname>io.hfile.bloom.error.rate</varname> = average false
-        positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
-        bit per bloom entry.</para>
-        </section>
-
-        <section>
-        <title><varname>io.hfile.bloom.max.fold</varname></title>
-
-        <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
-        fold rate. Most people should leave this alone. Default = 7, or can
-        collapse to at least 1/128th of original size. See the
-        <emphasis>Development Process</emphasis> section of the document <link
-        
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf";>BloomFilters
-        in HBase</link> for more on what this option means.</para>
-        </section>
-        </section>
-     </section>   <!--  bloom  -->
-
-  </section>  <!--  reading -->
-
-  <section xml:id="perf.deleting">
-    <title>Deleting from HBase</title>
-     <section xml:id="perf.deleting.queue">
-       <title>Using HBase Tables as Queues</title>
-       <para>HBase tables are sometimes used as queues.  In this case, special 
care must be taken to regularly perform major compactions on tables used in
-       this manner.  As is documented in <xref linkend="datamodel" />, marking 
rows as deleted creates additional StoreFiles which then need to be processed
-       on reads.  Tombstones only get cleaned up with major compactions.
-       </para>
-       <para>See also <xref linkend="compaction" /> and <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29";>HBaseAdmin.majorCompact</link>.
-       </para>
-     </section>
-     <section xml:id="perf.deleting.rpc">
-       <title>Delete RPC Behavior</title>
-       <para>Be aware that <code>htable.delete(Delete)</code> doesn't use the 
writeBuffer.  It will execute an RegionServer RPC with each invocation.
-       For a large number of deletes, consider 
<code>htable.delete(List)</code>.
-       </para>
-       <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29";></link>
-       </para>
-     </section>
-  </section>  <!--  deleting -->
-
-  <section xml:id="perf.hdfs"><title>HDFS</title>
-   <para>Because HBase runs on <xref linkend="arch.hdfs" /> it is important to 
understand how it works and how it affects
-   HBase.
-   </para>
-    <section xml:id="perf.hdfs.curr"><title>Current Issues With Low-Latency 
Reads</title>
-      <para>The original use-case for HDFS was batch processing.  As such, 
there low-latency reads were historically not a priority.
-      With the increased adoption of Apache HBase this is changing, and 
several improvements are already in development.
-      See the
-      <link 
xlink:href="https://issues.apache.org/jira/browse/HDFS-1599";>Umbrella Jira 
Ticket for HDFS Improvements for HBase</link>.
-      </para>
-    </section>
-    <section xml:id="perf.hdfs.configs.localread">
-    <title>Leveraging local data</title>
-<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
-<link 
xlink:href="https://issues.apache.org/jira/browse/HDFS-2246";>HDFS-2246</link>,
-it is possible for the DFSClient to take a "short circuit" and
-read directly from disk instead of going through the DataNode when the
-data is local. What this means for HBase is that the RegionServers can
-read directly off their machine's disks instead of having to open a
-socket to talk to the DataNode, the former being generally much
-faster<footnote><para>See JD's <link 
xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf";>Performance 
Talk</link></para></footnote>.
-Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1";>HBase, mail 
# dev - read short circuit</link> thread for
-more discussion around short circuit reads.
-</para>
-<para>To enable "short circuit" reads, it will depend on your version of 
Hadoop.
-    The original shortcircuit read patch was much improved upon in Hadoop 2 in
-    <link 
xlink:href="https://issues.apache.org/jira/browse/HDFS-347";>HDFS-347</link>.
-    See <link 
xlink:href="http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/";></link>
 for details
-    on the difference between the old and new implementations.  See
-    <link 
xlink:href="http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html";>Hadoop
 shortcircuit reads configuration page</link>
-    for how to enable the later version of shortcircuit.
-</para>
-<para>If you are running on an old Hadoop, one that is without
-    <link 
xlink:href="https://issues.apache.org/jira/browse/HDFS-347";>HDFS-347</link> but 
that
-    has
-<link 
xlink:href="https://issues.apache.org/jira/browse/HDFS-2246";>HDFS-2246</link>,
-you must set two configurations.
-First, the hdfs-site.xml needs to be amended. Set
-the property  <varname>dfs.block.local-path-access.user</varname>
-to be the <emphasis>only</emphasis> user that can use the shortcut.
-This has to be the user that started HBase.  Then in hbase-site.xml,
-set <varname>dfs.client.read.shortcircuit</varname> to be 
<varname>true</varname>
-</para>
-<para>
-    For optimal performance when short-circuit reads are enabled, it is 
recommended that HDFS checksums are disabled.
-    To maintain data integrity with HDFS checksums disabled, HBase can be 
configured to write its own checksums into
-    its datablocks and verify against these. See <xref 
linkend="hbase.regionserver.checksum.verify" />. When both
-    local short-circuit reads and hbase level checksums are enabled, you 
SHOULD NOT disable configuration parameter
-    "dfs.client.read.shortcircuit.skip.checksum", which will cause skipping 
checksum on non-hfile reads. HBase already
-    manages that setting under the covers.
-</para>
-<para>
-The DataNodes need to be restarted in order to pick up the new
-configuration. Be aware that if a process started under another
-username than the one configured here also has the shortcircuit
-enabled, it will get an Exception regarding an unauthorized access but
-the data will still be read.
-</para>
-<note xml:id="dfs.client.read.shortcircuit.buffer.size">
-    <title>dfs.client.read.shortcircuit.buffer.size</title>
-    <para>The default for this value is too high when running on a highly 
trafficed HBase.  Set it down from its
-        1M default down to 128k or so.  Put this configuration in the HBase 
configs (its a HDFS client-side configuration).
-        The Hadoop DFSClient in HBase will allocate a direct byte buffer of 
this size for <emphasis>each</emphasis>
-    block it has open; given HBase keeps its HDFS files open all the time, 
this can add up quickly.</para>
-</note>
-  </section>
-
-    <section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase 
vs. HDFS</title>
-     <para>A fairly common question on the dist-list is why HBase isn't as 
performant as HDFS files in a batch context (e.g., as
-     a MapReduce source or sink).  The short answer is that HBase is doing a 
lot more than HDFS (e.g., reading the KeyValues,
-     returning the most current row or specified timestamps, etc.), and as 
such HBase is 4-5 times slower than HDFS in this
-     processing context.  There is room for improvement and this gap will, 
over time, be reduced, but HDFS
-      will always be faster in this use-case.
-     </para>
-    </section>
-  </section>
-
-  <section xml:id="perf.ec2"><title>Amazon EC2</title>
-   <para>Performance questions are common on Amazon EC2 environments because 
it is a shared environment.  You will
-   not see the same throughput as a dedicated server.  In terms of running 
tests on EC2, run them several times for the same
-   reason (i.e., it's a shared environment and you don't know what else is 
happening on the server).
-   </para>
-   <para>If you are running on EC2 and post performance questions on the 
dist-list, please state this fact up-front that
-    because EC2 issues are practically a separate class of performance issues.
-   </para>
-  </section>
-
-  <section xml:id="perf.hbase.mr.cluster"><title>Collocating HBase and 
MapReduce</title>
-    <para>It is often recommended to have different clusters for HBase and 
MapReduce. A better qualification of this is:
-          don't collocate a HBase that serves live requests with a heavy MR 
workload. OLTP and OLAP-optimized systems have
-          conflicting requirements and one will lose to the other, usually the 
former. For example, short latency-sensitive
-          disk reads will have to wait in line behind longer reads that are 
trying to squeeze out as much throughput as
-          possible. MR jobs that write to HBase will also generate flushes and 
compactions, which will in turn invalidate
-          blocks in the <xref linkend="block.cache"/>.
-    </para>
-    <para>If you need to process the data from your live HBase cluster in MR, 
you can ship the deltas with <xref linkend="copy.table"/>
-          or use replication to get the new data in real time on the OLAP 
cluster. In the worst case, if you really need to
-          collocate both, set MR to use less Map and Reduce slots than you'd 
normally configure, possibly just one.
-    </para>
-    <para>When HBase is used for OLAP operations, it's preferable to set it up 
in a hardened way like configuring the ZooKeeper session
-          timeout higher and giving more memory to the MemStores (the argument 
being that the Block Cache won't be used much
-          since the workloads are usually long scans).
-    </para>
-  </section>
-
-  <section xml:id="perf.casestudy"><title>Case Studies</title>
-      <para>For Performance and Troubleshooting Case Studies, see <xref 
linkend="casestudies"/>.
-      </para>
-  </section>
-</chapter>

http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/preface.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/preface.xml b/src/main/docbkx/preface.xml
deleted file mode 100644
index 5308037..0000000
--- a/src/main/docbkx/preface.xml
+++ /dev/null
@@ -1,70 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<preface version="5.0" xml:id="preface" xmlns="http://docbook.org/ns/docbook";
-         xmlns:xlink="http://www.w3.org/1999/xlink";
-         xmlns:xi="http://www.w3.org/2001/XInclude";
-         xmlns:svg="http://www.w3.org/2000/svg";
-         xmlns:m="http://www.w3.org/1998/Math/MathML";
-         xmlns:html="http://www.w3.org/1999/xhtml";
-         xmlns:db="http://docbook.org/ns/docbook";>
-<!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <title>Preface</title>
-
-  <para>This is the official reference guide for the <link
-  xlink:href="http://hbase.apache.org/";>HBase</link> version it ships with.
-  Herein you will find either the definitive documentation on an HBase topic
-  as of its standing when the referenced HBase version shipped, or it
-  will point to the location in <link
-  xlink:href="http://hbase.apache.org/apidocs/index.html";>javadoc</link>,
-  <link xlink:href="https://issues.apache.org/jira/browse/HBASE";>JIRA</link>
-  or <link xlink:href="http://wiki.apache.org/hadoop/Hbase";>wiki</link> where
-  the pertinent information can be found.</para>
-
-  <para>This reference guide is a work in progress.  The source for this guide 
can
-      be found at <filename>src/main/docbkx</filename> in a checkout of the 
hbase
-      project.  This reference guide is marked up using
-      <link xlink:href="http://www.docbook.com/";>DocBook</link> from which the
-      the finished guide is generated as part of the 'site' build target.  Run
-      <programlisting>mvn site</programlisting> to generate this documentation.
-      Amendments and improvements to the documentation are welcomed.  Add a
-      patch to an issue up in the HBase <link
-  xlink:href="https://issues.apache.org/jira/browse/HBASE";>JIRA</link>.</para>
-
-  <note xml:id="headsup">
-      <title>Heads-up</title>
-      <para>
-          If this is your first foray into the wonderful world of
-          Distributed Computing, then you are in for
-          some interesting times.  First off, distributed systems are
-          hard; making a distributed system hum requires a disparate
-          skillset that spans systems (hardware and software) and
-          networking.  Your cluster' operation can hiccup because of any
-          of a myriad set of reasons from bugs in HBase itself through 
misconfigurations
-          -- misconfiguration of HBase but also operating system 
misconfigurations --
-          through to hardware problems whether it be a bug in your network card
-          drivers or an underprovisioned RAM bus (to mention two recent
-          examples of hardware issues that manifested as "HBase is slow").
-          You will also need to do a recalibration if up to this your
-          computing has been bound to a single box.  Here is one good
-          starting point:
-          <link 
xlink:href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing";>Fallacies
 of Distributed Computing</link>.
-      </para>
-  </note>
-</preface>

http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/rpc.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/rpc.xml b/src/main/docbkx/rpc.xml
deleted file mode 100644
index cbc59b7..0000000
--- a/src/main/docbkx/rpc.xml
+++ /dev/null
@@ -1,236 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<appendix xml:id="hbase.rpc"
-      version="5.0" xmlns="http://docbook.org/ns/docbook";
-      xmlns:xlink="http://www.w3.org/1999/xlink";
-      xmlns:xi="http://www.w3.org/2001/XInclude";
-      xmlns:svg="http://www.w3.org/2000/svg";
-      xmlns:m="http://www.w3.org/1998/Math/MathML";
-      xmlns:html="http://www.w3.org/1999/xhtml";
-      xmlns:db="http://docbook.org/ns/docbook";>
-  <!--/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-
-  <title>0.95 RPC Specification</title>
-  <para>In 0.95, all client/server communication is done with
-      <link 
xlink:href="https://code.google.com/p/protobuf/";>protobuf’ed</link> Messages 
rather than with
-      <link 
xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html";>Hadoop
 Writables</link>.
-      Our RPC wire format therefore changes.
-      This document describes the client/server request/response protocol and 
our new RPC wire-format.</para>
-  <para/>
-  <para>For what RPC is like in 0.94 and previous,
-      see Benoît/Tsuna’s <link 
xlink:href="https://github.com/OpenTSDB/asynchbase/blob/master/src/HBaseRpc.java#L164";>Unofficial
 Hadoop / HBase RPC protocol documentation</link>.
-      For more background on how we arrived at this spec., see
-      <link 
xlink:href="https://docs.google.com/document/d/1WCKwgaLDqBw2vpux0jPsAu2WPTRISob7HGCO8YhfDTA/edit#";>HBase
 RPC: WIP</link></para>
-  <para/>
-  <section><title>Goals</title>
-      <para>
-      <orderedlist>
-          <listitem>
-              <para>A wire-format we can evolve</para>
-          </listitem>
-          <listitem>
-              <para>A format that does not require our rewriting server core or
-                  radically changing its current architecture (for 
later).</para>
-          </listitem>
-      </orderedlist>
-  </para>
-  </section>
-  <section><title>TODO</title>
-      <para>
-      <orderedlist>
-          <listitem>
-              <para>List of problems with currently specified format and where
-                  we would like to go in a version2, etc. For example, what 
would we
-                  have to change if anything to move server async or to support
-                  streaming/chunking?</para>
-          </listitem>
-          <listitem>
-              <para>Diagram on how it works</para>
-          </listitem>
-          <listitem>
-              <para>A grammar that succinctly describes the wire-format. 
Currently
-                  we have these words and the content of the rpc protobuf idl 
but
-                  a grammar for the back and forth would help with groking 
rpc.  Also,
-                  a little state machine on client/server interactions would 
help
-              with understanding (and ensuring correct implementation).</para>
-          </listitem>
-      </orderedlist>
-  </para>
-  </section>
-  <section><title>RPC</title>
-  <para>The client will send setup information on connection establish.
-      Thereafter, the client invokes methods against the remote server sending 
a protobuf Message and receiving a protobuf Message in response.
-      Communication is synchronous.  All back and forth is preceded by an int 
that has the total length of the request/response.
-      Optionally, Cells(KeyValues) can be passed outside of protobufs in 
follow-behind Cell blocks (because
-      <link 
xlink:href="https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#";>we
 can’t protobuf megabytes of KeyValues</link> or Cells).
-      These CellBlocks are encoded and optionally compressed.</para>
-  <para/>
-  <para>For more detail on the protobufs involved, see the
-      <link 
xlink:href="http://svn.apache.org/viewvc/hbase/trunk/hbase-protocol/src/main/protobuf/RPC.proto?view=markup";>RPC.proto</link>
 file in trunk.</para>
-
-  <section>
-      <title>Connection Setup</title>
-      <para>Client initiates connection.</para>
-  <section><title>Client</title>
-      <para>On connection setup, client sends a preamble followed by a 
connection header.
-      </para>
-
-  <section>
-  <title>&lt;preamble&gt;</title>
-  <para><programlisting>&lt;MAGIC 4 byte integer&gt; &lt;1 byte RPC Format 
Version&gt; &lt;1 byte auth type&gt;<footnote><para> We need the auth method 
spec. here so the connection header is encoded if auth 
enabled.</para></footnote></programlisting></para>
-  <para>E.g.: HBas0x000x50 -- 4 bytes of MAGIC -- ‘HBas’ -- plus one-byte 
of version, 0 in this case, and one byte, 0x50 (SIMPLE). of an auth type.</para>
-  </section>
-
-  <section>
-      <title>&lt;Protobuf ConnectionHeader Message&gt;</title>
-      <para>Has user info, and “protocol”, as well as the encoders and 
compression the client will use sending CellBlocks.
-          CellBlock encoders and compressors are for the life of the 
connection.
-          CellBlock encoders implement org.apache.hadoop.hbase.codec.Codec.
-          CellBlocks may then also be compressed.
-          Compressors implement org.apache.hadoop.io.compress.CompressionCodec.
-          This protobuf is written using writeDelimited so is prefaced by a pb 
varint
-          with its serialized length</para>
-  </section>
-  </section><!--Client-->
-
-  <section><title>Server</title>
-      <para>After client sends preamble and connection header,
-          server does NOT respond if successful connection setup.
-          No response means server is READY to accept requests and to give out 
response.
-      If the version or authentication in the preamble is not agreeable or the 
server has trouble parsing the preamble,
-      it will throw a org.apache.hadoop.hbase.ipc.FatalConnectionException 
explaining the error and will then disconnect.
-      If the client in the connection header -- i.e. the protobuf’d Message 
that comes after the connection preamble -- asks for for a
-      Service the server does not support or a codec the server does not have, 
again we throw a FatalConnectionException with explanation.</para>
-  </section>
-  </section>
-
-  <section><title>Request</title>
-      <para>After a Connection has been set up, client makes requests.  Server 
responds.</para>
-      <para>A request is made up of a protobuf RequestHeader followed by a 
protobuf Message parameter.
-          The header includes the method name and optionally, metadata on the 
optional CellBlock that may be following.
-          The parameter type suits the method being invoked: i.e. if we are 
doing a getRegionInfo request,
-          the protobuf Message param will be an instance of 
GetRegionInfoRequest.
-          The response will be a GetRegionInfoResponse.
-          The CellBlock is optionally used ferrying the bulk of the RPC data: 
i.e Cells/KeyValues.</para>
-  <para/>
-  <section><title>Request Parts</title>
-      <section><title>&lt;Total Length&gt;</title>
-          <para>The request is prefaced by an int that holds the total length 
of what follows.</para>
-      </section>
-      <section><title>&lt;Protobuf RequestHeader Message&gt;</title>
-          <para>Will have call.id, trace.id, and method name, etc. including 
optional Metadata on the Cell block IFF one is following.
-              Data is protobuf’d inline in this pb Message or optionally 
comes in the following CellBlock</para>
-      </section>
-      <section><title>&lt;Protobuf Param Message&gt;</title>
-          <para>If the method being invoked is getRegionInfo, if you study the 
Service descriptor for the client to regionserver protocol,
-              you will find that the request sends a GetRegionInfoRequest 
protobuf Message param in this position.</para>
-      </section>
-      <section><title>&lt;CellBlock&gt;</title>
-  <para>An encoded and optionally compressed Cell block.</para>
-      </section>
-  </section><!--Request parts-->
-  </section><!--Request-->
-
-  <section><title>Response</title>
-      <para>Same as Request, it is a protobuf ResponseHeader followed by a 
protobuf Message response where the Message response type suits the method 
invoked.
-          Bulk of the data may come in a following CellBlock.</para>
-  <section><title>Response Parts</title>
-      <section><title>&lt;Total Length&gt;</title>
-          <para>The response is prefaced by an int that holds the total length 
of what follows.</para>
-      </section>
-          <section><title>&lt;Protobuf ResponseHeader Message&gt;</title>
-  <para>Will have call.id, etc. Will include exception if failed processing.  
Optionally includes metadata on optional, IFF there is a CellBlock 
following.</para>
-  </section>
-
-  <section><title>&lt;Protobuf Response Message&gt;</title>
-      <para>Return or may be nothing if exception. If the method being invoked 
is getRegionInfo, if you study the Service descriptor for the client to 
regionserver protocol,
-          you will find that the response sends a GetRegionInfoResponse 
protobuf Message param in this position.</para>
-  </section>
-  <section><title>&lt;CellBlock&gt;</title>
-  <para>An encoded and optionally compressed Cell block.</para>
-  </section>
-  </section><!--Parts-->
-  </section><!--Response-->
-
-  <section><title>Exceptions</title>
-      <para>There are two distinct types.
-          There is the request failed which is encapsulated inside the 
response header for the response.
-          The connection stays open to receive new requests.
-          The second type, the FatalConnectionException, kills the 
connection.</para>
-      <para>Exceptions can carry extra information.
-          See the ExceptionResponse protobuf type.
-          It has a flag to indicate do-no-retry as well as other miscellaneous 
payload to help improve client responsiveness.</para>
-  </section>
-  <section><title>CellBlocks</title>
-      <para>These are not versioned.
-          Server can do the codec or it cannot.
-          If new version of a codec with say, tighter encoding, then give it a 
new class name.
-          Codecs will live on the server for all time so old clients can 
connect.</para>
-  </section>
-  </section>
-
-
-  <section><title>Notes</title>
-  <section><title>Constraints</title>
-  <para>In some part, current wire-format -- i.e. all requests and responses 
preceeded by a length -- has been dictated by current server non-async 
architecture.</para>
-  </section>
-  <section><title>One fat pb request or header+param</title>
-      <para>We went with pb header followed by pb param making a request and a 
pb header followed by pb response for now.
-          Doing header+param rather than a single protobuf Message with both 
header and param content:</para>
-      <para>
-  <orderedlist>
-    <listitem>
-      <para>Is closer to what we currently have</para>
-    </listitem>
-    <listitem>
-      <para>Having a single fat pb requires extra copying putting the already 
pb’d param into the body of the fat request pb (and same making result)</para>
-    </listitem>
-    <listitem>
-      <para>We can decide whether to accept the request or not before we read 
the param; for example, the request might be low priority.  As is, we read 
header+param in one go as server is currently implemented so this is a 
TODO.</para>
-    </listitem>
-  </orderedlist>
-  </para>
-  <para>The advantages are minor.  If later, fat request has clear advantage, 
can roll out a v2 later.</para>
-  </section>
-  <section xml:id="rpc.configs"><title>RPC Configurations</title>
-  <section><title>CellBlock Codecs</title>
-      <para>To enable a codec other than the default 
<classname>KeyValueCodec</classname>,
-          set <varname>hbase.client.rpc.codec</varname>
-          to the name of the Codec class to use.  Codec must implement hbase's 
<classname>Codec</classname> Interface.  After connection setup,
-          all passed cellblocks will be sent with this codec.  The server will 
return cellblocks using this same codec as long
-          as the codec is on the servers' CLASSPATH (else you will get 
<classname>UnsupportedCellCodecException</classname>).</para>
-      <para>To change the default codec, set 
<varname>hbase.client.default.rpc.codec</varname>.
-      </para>
-      <para>To disable cellblocks completely and to go pure protobuf, set the 
default to the
-          empty String and do not specify a codec in your Configuration.  So, 
set <varname>hbase.client.default.rpc.codec</varname>
-          to the empty string and do not set 
<varname>hbase.client.rpc.codec</varname>.
-          This will cause the client to connect to the server with no codec 
specified.
-          If a server sees no codec, it will return all responses in pure 
protobuf.
-          Running pure protobuf all the time will be slower than running with 
cellblocks.
-      </para>
-  </section>
-  <section><title>Compression</title>
-      <para>Uses hadoops compression codecs.  To enable compressing of passed 
CellBlocks, set <varname>hbase.client.rpc.compressor</varname>
-          to the name of the Compressor to use.  Compressor must implement 
Hadoops' CompressionCodec Interface.  After connection setup,
-          all passed cellblocks will be sent compressed.  The server will 
return cellblocks compressed using this same compressor as long
-          as the compressor is on its CLASSPATH (else you will get 
<classname>UnsupportedCompressionCodecException</classname>).</para>
-  </section>
-  </section>
-  </section>
-</appendix>

Reply via email to