Repository: hbase
Updated Branches:
  refs/heads/master 3557a3235 -> a3b65c45a


HBASE-11692 Document how and why to do a manual region split

Incorporated Stack's feedback


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/a3b65c45
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/a3b65c45
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/a3b65c45

Branch: refs/heads/master
Commit: a3b65c45ad3c55fc5ce12e6d69701a2bcd84f055
Parents: 3557a32
Author: Misty Stanley-Jones <[email protected]>
Authored: Thu Oct 2 09:21:57 2014 +1000
Committer: Misty Stanley-Jones <[email protected]>
Committed: Tue Oct 7 16:46:31 2014 +1000

----------------------------------------------------------------------
 src/main/docbkx/book.xml          | 86 ++++++++++++++++++++++++++++++++++
 src/main/docbkx/configuration.xml |  4 +-
 src/main/docbkx/ops_mgt.xml       |  4 +-
 src/main/docbkx/performance.xml   |  6 +--
 4 files changed, 94 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/book.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index b2b4c78..eea00d6 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -3298,6 +3298,92 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, 
MyCustomSplitPolicy.class.getName(
         </section>
       </section>
 
+      <section xml:id="manual_region_splitting_decisions">
+        <title>Manual Region Splitting</title>
+        <para>It is possible to manually split your table, either at table 
creation (pre-splitting),
+          or at a later time as an administrative action. You might choose to 
split your region for
+          one or more of the following reasons. There may be other valid 
reasons, but the need to
+          manually split your table might also point to problems with your 
schema design.</para>
+        <itemizedlist>
+          <title>Reasons to Manually Split Your Table</title>
+          <listitem>
+            <para>Your data is sorted by timeseries or another similar 
algorithm that sorts new data
+              at the end of the table. This means that the Region Server 
holding the last region is
+              always under load, and the other Region Servers are idle, or 
mostly idle. See also
+                <xref linkend="timeseries"/>.</para>
+          </listitem>
+          <listitem>
+            <para>You have developed an unexpected hotspot in one region of 
your table. For
+              instance, an application which tracks web searches might be 
inundated by a lot of
+              searches for a celebrity in the event of news about that 
celebrity. See <xref
+                linkend="perf.one.region"/> for more discussion about this 
particular
+              scenario.</para>
+          </listitem>
+          <listitem>
+            <para>After a big increase to the number of Region Servers in your 
cluster, to get the
+              load spread out quickly.</para>
+          </listitem>
+          <listitem>
+            <para>Before a bulk-load which is likely to cause unusual and 
uneven load across
+              regions.</para>
+          </listitem>
+        </itemizedlist>
+        <para>See <xref linkend="disable.splitting"/> for a discussion about 
the dangers and
+          possible benefits of managing splitting completely manually.</para>
+        <section>
+          <title>Determining Split Points</title>
+          <para>The goal of splitting your table manually is to improve the 
chances of balancing the
+            load across the cluster in situations where good rowkey design 
alone won't get you
+            there. Keeping that in mind, the way you split your regions is 
very dependent upon the
+            characteristics of your data. It may be that you already know the 
best way to split your
+            table. If not, the way you split your table depends on what your 
keys are like.</para>
+          <variablelist>
+            <varlistentry>
+              <term>Alphanumeric Rowkeys</term>
+              <listitem>
+                <para>If your rowkeys start with a letter or number, you can 
split your table at
+                  letter or number boundaries. For instance, the following 
command creates a table
+                  with regions that split at each vowel, so the first region 
has A-D, the second
+                  region has E-H, the third region has I-N, the fourth region 
has O-V, and the fifth
+                  region has U-Z.</para>
+                  <screen>hbase> create 'test_table', 'f1', SPLITS=> ['a', 
'e', 'i', 'o', 'u']</screen>
+                <para>The following command splits an existing table at split 
point '2'.</para>
+                <screen>hbase> split 'test_table', '2'</screen>
+                <para>You can also split a specific region by referring to its 
ID. You can find the
+                  region ID by looking at either the table or region in the 
Web UI. It will be a
+                  long number such as
+                    
<literal>t2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.</literal>. The
+                  format is 
<replaceable>table_name,start_key,region_id</replaceable>To split that
+                  region into two, as close to equally as possible (at the 
nearest row boundary),
+                  issue the following command.</para>
+                <screen>hbase> split 
't2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.'</screen>
+                <para>The split key is optional. If it is omitted, the table 
or region is split in
+                  half.</para>
+                <para>The following example shows how to use the 
RegionSplitter to create 10
+                  regions, split at hexadecimal values.</para>
+                <screen>hbase org.apache.hadoop.hbase.util.RegionSplitter 
test_table HexStringSplit -c 10 -f f1</screen>
+              </listitem>
+            </varlistentry>
+            <varlistentry>
+              <term>Using a Custom Algorithm</term>
+              <listitem>
+                <para>The RegionSplitter tool is provided with HBase, and uses 
a <firstterm><link
+                      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html";
+                      >SplitAlgorithm</link></firstterm> to determine split 
points for you. As
+                  parameters, you give it the algorithm, desired number of 
regions, and column
+                  families. It includes two split algorithms. The first is the 
<code><link
+                      
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html";
+                      >HexStringSplit</link></code> algorithm, which assumes 
the row keys are
+                  hexadecimal strings. The second, <link
+                    
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html";
+                    >UniformSplit</link>, assumes the row keys are random byte 
arrays. You will
+                  probably need to develop your own SplitAlgorithm, using the 
provided ones as
+                  models. </para>
+              </listitem>
+            </varlistentry>
+          </variablelist>
+        </section>
+      </section>
        <section>
         <title>Online Region Merges</title>
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/configuration.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/configuration.xml 
b/src/main/docbkx/configuration.xml
index 0af2b3c..aec8a00 100644
--- a/src/main/docbkx/configuration.xml
+++ b/src/main/docbkx/configuration.xml
@@ -1355,7 +1355,9 @@ export HBASE_HEAPSIZE=4096
             <varname>hbase.hregion.max.filesize</varname>,
             <varname>hbase.regionserver.regionSplitLimit</varname>. A 
simplistic view of splitting
           is that when a region grows to 
<varname>hbase.hregion.max.filesize</varname>, it is split.
-          For most use patterns, most of the time, you should use automatic 
splitting.</para>
+          For most use patterns, most of the time, you should use automatic 
splitting. See <xref
+            linkend="manual_region_splitting_decisions"/> for more information 
about manual region
+          splitting.</para>
         <para>Instead of allowing HBase to split your regions automatically, 
you can choose to
           manage the splitting yourself. This feature was added in HBase 
0.90.0. Manually managing
           splits works if you know your keyspace well, otherwise let HBase 
figure where to split for you.

http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/ops_mgt.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml
index 1f83a15..ea7883b 100644
--- a/src/main/docbkx/ops_mgt.xml
+++ b/src/main/docbkx/ops_mgt.xml
@@ -2227,8 +2227,8 @@ hbase> restore_snapshot 'myTableSnapshot-122112'
           pre-split 1 region per RS at most), especially if you don't know how 
much each table will
           grow. If you split too much, you may end up with too many regions, 
with some tables having
           too many small regions.</para>
-        <para>For pre-splitting howto, see <xref
-            linkend="precreate.regions" />.</para>
+        <para>For pre-splitting howto, see <xref 
linkend="manual_region_splitting_decisions"/> and
+            <xref linkend="precreate.regions"/>.</para>
       </section>
       <!-- ops.capacity.config.presplit -->
     </section>

http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/performance.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index e7c0fc7..59287ee 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -682,9 +682,9 @@ admin.createTable(table, startKey, endKey, numberOfRegions);
 byte[][] splits = ...;   // create your own splits
 admin.createTable(table, splits);
 </programlisting>
-      <para> See <xref
-          linkend="rowkey.regionsplits" /> for issues related to understanding 
your keyspace and
-        pre-creating regions. </para>
+      <para> See <xref linkend="rowkey.regionsplits"/> for issues related to 
understanding your
+        keyspace and pre-creating regions. See <xref 
linkend="manual_region_splitting_decisions"/>
+        for discussion on manually pre-splitting regions.</para>
     </section>
     <section
       xml:id="def.log.flush">

Reply via email to