Repository: hbase Updated Branches: refs/heads/master 3557a3235 -> a3b65c45a
HBASE-11692 Document how and why to do a manual region split Incorporated Stack's feedback Project: http://git-wip-us.apache.org/repos/asf/hbase/repo Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/a3b65c45 Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/a3b65c45 Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/a3b65c45 Branch: refs/heads/master Commit: a3b65c45ad3c55fc5ce12e6d69701a2bcd84f055 Parents: 3557a32 Author: Misty Stanley-Jones <[email protected]> Authored: Thu Oct 2 09:21:57 2014 +1000 Committer: Misty Stanley-Jones <[email protected]> Committed: Tue Oct 7 16:46:31 2014 +1000 ---------------------------------------------------------------------- src/main/docbkx/book.xml | 86 ++++++++++++++++++++++++++++++++++ src/main/docbkx/configuration.xml | 4 +- src/main/docbkx/ops_mgt.xml | 4 +- src/main/docbkx/performance.xml | 6 +-- 4 files changed, 94 insertions(+), 6 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/book.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml index b2b4c78..eea00d6 100644 --- a/src/main/docbkx/book.xml +++ b/src/main/docbkx/book.xml @@ -3298,6 +3298,92 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName( </section> </section> + <section xml:id="manual_region_splitting_decisions"> + <title>Manual Region Splitting</title> + <para>It is possible to manually split your table, either at table creation (pre-splitting), + or at a later time as an administrative action. You might choose to split your region for + one or more of the following reasons. There may be other valid reasons, but the need to + manually split your table might also point to problems with your schema design.</para> + <itemizedlist> + <title>Reasons to Manually Split Your Table</title> + <listitem> + <para>Your data is sorted by timeseries or another similar algorithm that sorts new data + at the end of the table. This means that the Region Server holding the last region is + always under load, and the other Region Servers are idle, or mostly idle. See also + <xref linkend="timeseries"/>.</para> + </listitem> + <listitem> + <para>You have developed an unexpected hotspot in one region of your table. For + instance, an application which tracks web searches might be inundated by a lot of + searches for a celebrity in the event of news about that celebrity. See <xref + linkend="perf.one.region"/> for more discussion about this particular + scenario.</para> + </listitem> + <listitem> + <para>After a big increase to the number of Region Servers in your cluster, to get the + load spread out quickly.</para> + </listitem> + <listitem> + <para>Before a bulk-load which is likely to cause unusual and uneven load across + regions.</para> + </listitem> + </itemizedlist> + <para>See <xref linkend="disable.splitting"/> for a discussion about the dangers and + possible benefits of managing splitting completely manually.</para> + <section> + <title>Determining Split Points</title> + <para>The goal of splitting your table manually is to improve the chances of balancing the + load across the cluster in situations where good rowkey design alone won't get you + there. Keeping that in mind, the way you split your regions is very dependent upon the + characteristics of your data. It may be that you already know the best way to split your + table. If not, the way you split your table depends on what your keys are like.</para> + <variablelist> + <varlistentry> + <term>Alphanumeric Rowkeys</term> + <listitem> + <para>If your rowkeys start with a letter or number, you can split your table at + letter or number boundaries. For instance, the following command creates a table + with regions that split at each vowel, so the first region has A-D, the second + region has E-H, the third region has I-N, the fourth region has O-V, and the fifth + region has U-Z.</para> + <screen>hbase> create 'test_table', 'f1', SPLITS=> ['a', 'e', 'i', 'o', 'u']</screen> + <para>The following command splits an existing table at split point '2'.</para> + <screen>hbase> split 'test_table', '2'</screen> + <para>You can also split a specific region by referring to its ID. You can find the + region ID by looking at either the table or region in the Web UI. It will be a + long number such as + <literal>t2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.</literal>. The + format is <replaceable>table_name,start_key,region_id</replaceable>To split that + region into two, as close to equally as possible (at the nearest row boundary), + issue the following command.</para> + <screen>hbase> split 't2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.'</screen> + <para>The split key is optional. If it is omitted, the table or region is split in + half.</para> + <para>The following example shows how to use the RegionSplitter to create 10 + regions, split at hexadecimal values.</para> + <screen>hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1</screen> + </listitem> + </varlistentry> + <varlistentry> + <term>Using a Custom Algorithm</term> + <listitem> + <para>The RegionSplitter tool is provided with HBase, and uses a <firstterm><link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html" + >SplitAlgorithm</link></firstterm> to determine split points for you. As + parameters, you give it the algorithm, desired number of regions, and column + families. It includes two split algorithms. The first is the <code><link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html" + >HexStringSplit</link></code> algorithm, which assumes the row keys are + hexadecimal strings. The second, <link + xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html" + >UniformSplit</link>, assumes the row keys are random byte arrays. You will + probably need to develop your own SplitAlgorithm, using the provided ones as + models. </para> + </listitem> + </varlistentry> + </variablelist> + </section> + </section> <section> <title>Online Region Merges</title> http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/configuration.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml index 0af2b3c..aec8a00 100644 --- a/src/main/docbkx/configuration.xml +++ b/src/main/docbkx/configuration.xml @@ -1355,7 +1355,9 @@ export HBASE_HEAPSIZE=4096 <varname>hbase.hregion.max.filesize</varname>, <varname>hbase.regionserver.regionSplitLimit</varname>. A simplistic view of splitting is that when a region grows to <varname>hbase.hregion.max.filesize</varname>, it is split. - For most use patterns, most of the time, you should use automatic splitting.</para> + For most use patterns, most of the time, you should use automatic splitting. See <xref + linkend="manual_region_splitting_decisions"/> for more information about manual region + splitting.</para> <para>Instead of allowing HBase to split your regions automatically, you can choose to manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing splits works if you know your keyspace well, otherwise let HBase figure where to split for you. http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/ops_mgt.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml index 1f83a15..ea7883b 100644 --- a/src/main/docbkx/ops_mgt.xml +++ b/src/main/docbkx/ops_mgt.xml @@ -2227,8 +2227,8 @@ hbase> restore_snapshot 'myTableSnapshot-122112' pre-split 1 region per RS at most), especially if you don't know how much each table will grow. If you split too much, you may end up with too many regions, with some tables having too many small regions.</para> - <para>For pre-splitting howto, see <xref - linkend="precreate.regions" />.</para> + <para>For pre-splitting howto, see <xref linkend="manual_region_splitting_decisions"/> and + <xref linkend="precreate.regions"/>.</para> </section> <!-- ops.capacity.config.presplit --> </section> http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/performance.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml index e7c0fc7..59287ee 100644 --- a/src/main/docbkx/performance.xml +++ b/src/main/docbkx/performance.xml @@ -682,9 +682,9 @@ admin.createTable(table, startKey, endKey, numberOfRegions); byte[][] splits = ...; // create your own splits admin.createTable(table, splits); </programlisting> - <para> See <xref - linkend="rowkey.regionsplits" /> for issues related to understanding your keyspace and - pre-creating regions. </para> + <para> See <xref linkend="rowkey.regionsplits"/> for issues related to understanding your + keyspace and pre-creating regions. See <xref linkend="manual_region_splitting_decisions"/> + for discussion on manually pre-splitting regions.</para> </section> <section xml:id="def.log.flush">
