Author: stack
Date: Mon Sep 27 20:58:26 2010
New Revision: 1001907
URL: http://svn.apache.org/viewvc?rev=1001907&view=rev
Log:
Added note on hlog tool, that it can be used to look at files in recovered
edits file
Modified:
hbase/trunk/src/docbkx/book.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL:
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1001907&r1=1001906&r2=1001907&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Mon Sep 27 20:58:26 2010
@@ -66,54 +66,48 @@
<para>TODO: Review all of the below to ensure it matches what was
committed -- St.Ack 20100901</para>
</note>
+
<section>
- <title>
- Region Size
- </title>
-<para>Region size is one of those tricky things, there are a few factors to
consider:
-</para>
- <itemizedlist>
- <listitem>
- <para>
-Regions are the basic element of availability and distribution.
- </para>
- </listitem>
- <listitem>
- <para>
-HBase scales by having regions across many servers. Thus if you
-have 2 regions for 16GB data, on a 20 node machine you are a net loss
-there.
- </para>
- </listitem>
- <listitem>
- <para>
-High region count has been known to make things slow, this is
-getting better, but it is probably better to have 700 regions than
-3000 for the same amount of data.
- </para>
- </listitem>
- <listitem>
- <para>
-Low region count prevents parallel scalability as per point #2.
-This really cant be stressed enough, since a common problem is loading
-200MB data into HBase then wondering why your awesome 10 node cluster
-is mostly idle.
- </para>
- </listitem>
- <listitem>
- <para>
-There is not much memory footprint difference between 1 region and
-10 in terms of indexes, etc, held by the regionserver.
- </para>
- </listitem>
- </itemizedlist>
+ <title>Region Size</title>
-<para>Its probably best to stick to the default,
-perhaps going smaller for hot tables (or manually split hot regions
-to spread the load over the cluster), or go with a 1GB region size
-if your cell sizes tend to be largish (100k and up).
-</para>
+ <para>Region size is one of those tricky things, there are a few factors
+ to consider:</para>
+ <itemizedlist>
+ <listitem>
+ <para>Regions are the basic element of availability and
+ distribution.</para>
+ </listitem>
+
+ <listitem>
+ <para>HBase scales by having regions across many servers. Thus if
+ you have 2 regions for 16GB data, on a 20 node machine you are a net
+ loss there.</para>
+ </listitem>
+
+ <listitem>
+ <para>High region count has been known to make things slow, this is
+ getting better, but it is probably better to have 700 regions than
+ 3000 for the same amount of data.</para>
+ </listitem>
+
+ <listitem>
+ <para>Low region count prevents parallel scalability as per point
+ #2. This really cant be stressed enough, since a common problem is
+ loading 200MB data into HBase then wondering why your awesome 10
+ node cluster is mostly idle.</para>
+ </listitem>
+
+ <listitem>
+ <para>There is not much memory footprint difference between 1 region
+ and 10 in terms of indexes, etc, held by the regionserver.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Its probably best to stick to the default, perhaps going smaller
+ for hot tables (or manually split hot regions to spread the load over
+ the cluster), or go with a 1GB region size if your cell sizes tend to be
+ largish (100k and up).</para>
</section>
<section>
@@ -739,10 +733,11 @@ if your cell sizes tend to be largish (1
<title>WAL Tools</title>
<section>
- <title><classname>HLog</classname> main</title>
+ <title><classname>HLog</classname> tool</title>
<para>The main method on <classname>HLog</classname> offers manual
- split and dump facilities.</para>
+ split and dump facilities. Pass it WALs or the product of a split, the
+ content of the <filename>recovered.edits</filename>. directory.</para>
<para>You can get a textual dump of a WAL file content by doing the
following:<programlisting> <code>$ ./bin/hbase
org.apache.hadoop.hbase.regionserver.wal.HLog --dump
hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code>
</programlisting>The