http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/troubleshooting.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/troubleshooting.xml 
b/src/main/docbkx/troubleshooting.xml
deleted file mode 100644
index 0a13f49..0000000
--- a/src/main/docbkx/troubleshooting.xml
+++ /dev/null
@@ -1,1154 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter version="5.0" xml:id="trouble"
-         xmlns="http://docbook.org/ns/docbook";
-         xmlns:xlink="http://www.w3.org/1999/xlink";
-         xmlns:xi="http://www.w3.org/2001/XInclude";
-         xmlns:svg="http://www.w3.org/2000/svg";
-         xmlns:m="http://www.w3.org/1998/Math/MathML";
-         xmlns:html="http://www.w3.org/1999/xhtml";
-         xmlns:db="http://docbook.org/ns/docbook";>
-<!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <title>Troubleshooting and Debugging Apache HBase</title>
-    <section xml:id="trouble.general">
-      <title>General Guidelines</title>
-      <para>
-          Always start with the master log (TODO: Which lines?).
-          Normally it’s just printing the same lines over and over again.
-          If not, then there’s an issue.
-          Google or <link 
xlink:href="http://search-hadoop.com";>search-hadoop.com</link>
-          should return some hits for those exceptions you’re seeing.
-      </para>
-      <para>
-          An error rarely comes alone in Apache HBase, usually when something 
gets screwed up what will
-          follow may be hundreds of exceptions and stack traces coming from 
all over the place.
-          The best way to approach this type of problem is to walk the log up 
to where it all
-          began, for example one trick with RegionServers is that they will 
print some
-          metrics when aborting so grepping for <emphasis>Dump</emphasis>
-          should get you around the start of the problem.
-      </para>
-      <para>
-          RegionServer suicides are “normal”, as this is what they do when 
something goes wrong.
-          For example, if ulimit and xcievers (the two most important initial 
settings, see <xref linkend="ulimit" />)
-          aren’t changed, it will make it impossible at some point for 
DataNodes to create new threads
-          that from the HBase point of view is seen as if HDFS was gone. Think 
about what would happen if your
-          MySQL database was suddenly unable to access files on your local 
file system, well it’s the same with
-          HBase and HDFS. Another very common reason to see RegionServers 
committing seppuku is when they enter
-          prolonged garbage collection pauses that last longer than the 
default ZooKeeper session timeout.
-          For more information on GC pauses, see the
-          <link 
xlink:href="http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/";>3
 part blog post</link>  by Todd Lipcon
-          and <xref linkend="gcpause" /> above.
-      </para>
-    </section>
-    <section xml:id="trouble.log">
-      <title>Logs</title>
-      <para>
-      The key process logs are as follows...   (replace &lt;user&gt; with the 
user that started the service, and &lt;hostname&gt; for the machine name)
-      </para>
-      <para>
-      NameNode:  
<filename>$HADOOP_HOME/logs/hadoop-&lt;user&gt;-namenode-&lt;hostname&gt;.log</filename>
-      </para>
-      <para>
-      DataNode:  
<filename>$HADOOP_HOME/logs/hadoop-&lt;user&gt;-datanode-&lt;hostname&gt;.log</filename>
-      </para>
-      <para>
-      JobTracker:  
<filename>$HADOOP_HOME/logs/hadoop-&lt;user&gt;-jobtracker-&lt;hostname&gt;.log</filename>
-      </para>
-      <para>
-      TaskTracker:  
<filename>$HADOOP_HOME/logs/hadoop-&lt;user&gt;-tasktracker-&lt;hostname&gt;.log</filename>
-      </para>
-      <para>
-      HMaster:  
<filename>$HBASE_HOME/logs/hbase-&lt;user&gt;-master-&lt;hostname&gt;.log</filename>
-      </para>
-      <para>
-      RegionServer:  
<filename>$HBASE_HOME/logs/hbase-&lt;user&gt;-regionserver-&lt;hostname&gt;.log</filename>
-      </para>
-      <para>
-      ZooKeeper:  <filename>TODO</filename>
-      </para>
-      <section xml:id="trouble.log.locations">
-        <title>Log Locations</title>
-        <para>For stand-alone deployments the logs are obviously going to be 
on a single machine, however this is a development configuration only.
-        Production deployments need to run on a cluster.</para>
-        <section xml:id="trouble.log.locations.namenode">
-          <title>NameNode</title>
-          <para>The NameNode log is on the NameNode server.  The HBase Master 
is typically run on the NameNode server, and well as ZooKeeper.</para>
-          <para>For smaller clusters the JobTracker is typically run on the 
NameNode server as well.</para>
-         </section>
-        <section xml:id="trouble.log.locations.datanode">
-          <title>DataNode</title>
-          <para>Each DataNode server will have a DataNode log for HDFS, as 
well as a RegionServer log for HBase.</para>
-          <para>Additionally, each DataNode server will also have a 
TaskTracker log for MapReduce task execution.</para>
-         </section>
-        </section>
-        <section xml:id="trouble.log.levels">
-          <title>Log Levels</title>
-         <section xml:id="rpc.logging"><title>Enabling RPC-level 
logging</title>
-          <para>Enabling the RPC-level logging on a RegionServer can often 
given
-           insight on timings at the server.  Once enabled, the amount of log
-           spewed is voluminous.  It is not recommended that you leave this
-           logging on for more than short bursts of time.  To enable RPC-level
-           logging, browse to the RegionServer UI and click on
-           <emphasis>Log Level</emphasis>.  Set the log level to 
<varname>DEBUG</varname> for the package
-           <classname>org.apache.hadoop.ipc</classname> (Thats right, for
-           <classname>hadoop.ipc</classname>, NOT, 
<classname>hbase.ipc</classname>).  Then tail the RegionServers log.  
Analyze.</para>
-           <para>To disable, set the logging level back to 
<varname>INFO</varname> level.
-           </para>
-         </section>
-       </section>
-      <section xml:id="trouble.log.gc">
-        <title>JVM Garbage Collection Logs</title>
-          <para>HBase is memory intensive, and using the default GC you can 
see long pauses in all threads including the <emphasis>Juliet Pause</emphasis> 
aka "GC of Death".
-           To help debug this or confirm this is happening GC logging can be 
turned on in the Java virtual machine.
-          </para>
-          <para>
-          To enable, in <filename>hbase-env.sh</filename>, uncomment one of 
the below lines :
-          <programlisting>
-# This enables basic gc logging to the .out file.
-# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps"
-
-# This enables basic gc logging to its own file.
-# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt;"
-
-# This enables basic GC logging to its own file with automatic log rolling. 
Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
-# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt; -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
-
-# If &lt;FILE-PATH&gt; is not replaced, the log file(.gc) would be generated 
in the HBASE_LOG_DIR.
-          </programlisting>
-          </para>
-          <para>
-           At this point you should see logs like so:
-          <programlisting>
-64898.952: [GC [1 CMS-initial-mark: 2811538K(3055704K)] 2812179K(3061272K), 
0.0007360 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
-64898.953: [CMS-concurrent-mark-start]
-64898.971: [GC 64898.971: [ParNew: 5567K->576K(5568K), 0.0101110 secs] 
2817105K->2812715K(3061272K), 0.0102200 secs] [Times: user=0.07 sys=0.00, 
real=0.01 secs]
-          </programlisting>
-          </para>
-          <para>
-           In this section, the first line indicates a 0.0007360 second pause 
for the CMS to initially mark. This pauses the entire VM, all threads for that 
period of time.
-            </para>
-            <para>
-           The third line indicates a "minor GC", which pauses the VM for 
0.0101110 seconds - aka 10 milliseconds. It has reduced the "ParNew" from about 
5.5m to 576k.
-           Later on in this cycle we see:
-           <programlisting>
-64901.445: [CMS-concurrent-mark: 1.542/2.492 secs] [Times: user=10.49 
sys=0.33, real=2.49 secs]
-64901.445: [CMS-concurrent-preclean-start]
-64901.453: [GC 64901.453: [ParNew: 5505K->573K(5568K), 0.0062440 secs] 
2868746K->2864292K(3061272K), 0.0063360 secs] [Times: user=0.05 sys=0.00, 
real=0.01 secs]
-64901.476: [GC 64901.476: [ParNew: 5563K->575K(5568K), 0.0072510 secs] 
2869283K->2864837K(3061272K), 0.0073320 secs] [Times: user=0.05 sys=0.01, 
real=0.01 secs]
-64901.500: [GC 64901.500: [ParNew: 5517K->573K(5568K), 0.0120390 secs] 
2869780K->2865267K(3061272K), 0.0121150 secs] [Times: user=0.09 sys=0.00, 
real=0.01 secs]
-64901.529: [GC 64901.529: [ParNew: 5507K->569K(5568K), 0.0086240 secs] 
2870200K->2865742K(3061272K), 0.0087180 secs] [Times: user=0.05 sys=0.00, 
real=0.01 secs]
-64901.554: [GC 64901.555: [ParNew: 5516K->575K(5568K), 0.0107130 secs] 
2870689K->2866291K(3061272K), 0.0107820 secs] [Times: user=0.06 sys=0.00, 
real=0.01 secs]
-64901.578: [CMS-concurrent-preclean: 0.070/0.133 secs] [Times: user=0.48 
sys=0.01, real=0.14 secs]
-64901.578: [CMS-concurrent-abortable-preclean-start]
-64901.584: [GC 64901.584: [ParNew: 5504K->571K(5568K), 0.0087270 secs] 
2871220K->2866830K(3061272K), 0.0088220 secs] [Times: user=0.05 sys=0.00, 
real=0.01 secs]
-64901.609: [GC 64901.609: [ParNew: 5512K->569K(5568K), 0.0063370 secs] 
2871771K->2867322K(3061272K), 0.0064230 secs] [Times: user=0.06 sys=0.00, 
real=0.01 secs]
-64901.615: [CMS-concurrent-abortable-preclean: 0.007/0.037 secs] [Times: 
user=0.13 sys=0.00, real=0.03 secs]
-64901.616: [GC[YG occupancy: 645 K (5568 K)]64901.616: [Rescan (parallel) , 
0.0020210 secs]64901.618: [weak refs processing, 0.0027950 secs] [1 CMS-remark: 
2866753K(3055704K)] 2867399K(3061272K), 0.0049380 secs] [Times: user=0.00 
sys=0.01, real=0.01 secs]
-64901.621: [CMS-concurrent-sweep-start]
-            </programlisting>
-            </para>
-            <para>
-            The first line indicates that the CMS concurrent mark (finding 
garbage) has taken 2.4 seconds. But this is a _concurrent_ 2.4 seconds, Java 
has not been paused at any point in time.
-            </para>
-            <para>
-            There are a few more minor GCs, then there is a pause at the 2nd 
last line:
-            <programlisting>
-64901.616: [GC[YG occupancy: 645 K (5568 K)]64901.616: [Rescan (parallel) , 
0.0020210 secs]64901.618: [weak refs processing, 0.0027950 secs] [1 CMS-remark: 
2866753K(3055704K)] 2867399K(3061272K), 0.0049380 secs] [Times: user=0.00 
sys=0.01, real=0.01 secs]
-            </programlisting>
-            </para>
-            <para>
-            The pause here is 0.0049380 seconds (aka 4.9 milliseconds) to 
'remark' the heap.
-            </para>
-            <para>
-            At this point the sweep starts, and you can watch the heap size go 
down:
-            <programlisting>
-64901.637: [GC 64901.637: [ParNew: 5501K->569K(5568K), 0.0097350 secs] 
2871958K->2867441K(3061272K), 0.0098370 secs] [Times: user=0.05 sys=0.00, 
real=0.01 secs]
-...  lines removed ...
-64904.936: [GC 64904.936: [ParNew: 5532K->568K(5568K), 0.0070720 secs] 
1365024K->1360689K(3061272K), 0.0071930 secs] [Times: user=0.05 sys=0.00, 
real=0.01 secs]
-64904.953: [CMS-concurrent-sweep: 2.030/3.332 secs] [Times: user=9.57 
sys=0.26, real=3.33 secs]
-            </programlisting>
-            At this point, the CMS sweep took 3.332 seconds, and heap went 
from about ~ 2.8 GB to 1.3 GB (approximate).
-            </para>
-            <para>
-            The key points here is to keep all these pauses low. CMS pauses 
are always low, but if your ParNew starts growing, you can see minor GC pauses 
approach 100ms, exceed 100ms and hit as high at 400ms.
-            </para>
-            <para>
-            This can be due to the size of the ParNew, which should be 
relatively small. If your ParNew is very large after running HBase for a while, 
in one example a ParNew was about 150MB, then you might have to constrain the 
size of ParNew (The larger it is, the longer the collections take but if its 
too small, objects are promoted to old gen too quickly). In the below we 
constrain new gen size to 64m.
-            </para>
-            <para>
-             Add the below line in <filename>hbase-env.sh</filename>:
-            <programlisting>
-export SERVER_GC_OPTS="$SERVER_GC_OPTS -XX:NewSize=64m -XX:MaxNewSize=64m"
-            </programlisting>
-            </para>
-            <para>
-            Similarly, to enable GC logging for client processes, uncomment 
one of the below lines in <filename>hbase-env.sh</filename>:
-            <programlisting>
-# This enables basic gc logging to the .out file.
-# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps"
-
-# This enables basic gc logging to its own file.
-# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt;"
-
-# This enables basic GC logging to its own file with automatic log rolling. 
Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
-# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt; -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
-
-# If &lt;FILE-PATH&gt; is not replaced, the log file(.gc) would be generated 
in the HBASE_LOG_DIR .
-            </programlisting>
-            </para>
-            <para>
-            For more information on GC pauses, see the <link 
xlink:href="http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/";>3
 part blog post</link>  by Todd Lipcon
-            and <xref linkend="gcpause" /> above.
-            </para>
-      </section>
-    </section>
-    <section xml:id="trouble.resources">
-      <title>Resources</title>
-      <section xml:id="trouble.resources.searchhadoop">
-        <title>search-hadoop.com</title>
-        <para>
-        <link xlink:href="http://search-hadoop.com";>search-hadoop.com</link> 
indexes all the mailing lists and is great for historical searches.
-        Search here first when you have an issue as its more than likely 
someone has already had your problem.
-        </para>
-      </section>
-      <section xml:id="trouble.resources.lists">
-        <title>Mailing Lists</title>
-        <para>Ask a question on the <link 
xlink:href="http://hbase.apache.org/mail-lists.html";>Apache HBase mailing 
lists</link>.
-        The 'dev' mailing list is aimed at the community of developers 
actually building Apache HBase and for features currently under development, 
and 'user'
-        is generally used for questions on released versions of Apache HBase.  
Before going to the mailing list, make sure your
-        question has not already been answered by searching the mailing list 
archives first.  Use
-        <xref linkend="trouble.resources.searchhadoop" />.
-        Take some time crafting your question<footnote><para>See <link 
xlink="http://www.mikeash.com/getting_answers.html";>Getting 
Answers</link></para></footnote>; a quality question that includes all context 
and
-        exhibits evidence the author has tried to find answers in the manual 
and out on lists
-        is more likely to get a prompt response.
-        </para>
-      </section>
-      <section xml:id="trouble.resources.irc">
-        <title>IRC</title>
-        <para>#hbase on irc.freenode.net</para>
-      </section>
-      <section xml:id="trouble.resources.jira">
-        <title>JIRA</title>
-        <para>
-        <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE";>JIRA</link> is also 
really helpful when looking for Hadoop/HBase-specific issues.
-        </para>
-      </section>
-    </section>
-    <section xml:id="trouble.tools">
-      <title>Tools</title>
-         <section xml:id="trouble.tools.builtin">
-           <title>Builtin Tools</title>
-            <section xml:id="trouble.tools.builtin.webmaster">
-              <title>Master Web Interface</title>
-              <para>The Master starts a web-interface on port 60010 by default.
-              </para>
-              <para>The Master web UI lists created tables and their 
definition (e.g., ColumnFamilies, blocksize, etc.).  Additionally,
-              the available RegionServers in the cluster are listed along with 
selected high-level metrics (requests, number of regions, usedHeap, maxHeap).
-              The Master web UI allows navigation to each RegionServer's web 
UI.
-              </para>
-            </section>
-            <section xml:id="trouble.tools.builtin.webregion">
-              <title>RegionServer Web Interface</title>
-              <para>RegionServers starts a web-interface on port 60030 by 
default.
-              </para>
-              <para>The RegionServer web UI lists online regions and their 
start/end keys, as well as point-in-time RegionServer metrics (requests, 
regions, storeFileIndexSize, compactionQueueSize, etc.).
-              </para>
-              <para>See <xref linkend="hbase_metrics"/> for more information 
in metric definitions.
-            </para>
-          </section>
-          <section xml:id="trouble.tools.builtin.zkcli">
-             <title>zkcli</title>
-              <para><code>zkcli</code> is a very useful tool for investigating 
ZooKeeper-related issues.  To invoke:
-<programlisting>
-./hbase zkcli -server host:port &lt;cmd&gt; &lt;args&gt;
-</programlisting>
-              The commands (and arguments) are:
-<programlisting>
-       connect host:port
-       get path [watch]
-       ls path [watch]
-       set path data [version]
-       delquota [-n|-b] path
-       quit
-       printwatches on|off
-       create [-s] [-e] path data acl
-       stat path [watch]
-       close
-       ls2 path [watch]
-       history
-       listquota path
-       setAcl path acl
-       getAcl path
-       sync path
-       redo cmdno
-       addauth scheme auth
-       delete path [version]
-       setquota -n|-b val path
-</programlisting>
-            </para>
-        </section>
-       </section>
-       <section xml:id="trouble.tools.external">
-          <title>External Tools</title>
-      <section xml:id="trouble.tools.tail">
-        <title>tail</title>
-        <para>
-        <code>tail</code> is the command line tool that lets you look at the 
end of a file. Add the “-f” option and it will refresh when new data is 
available. It’s useful when you are wondering what’s happening, for 
example, when a cluster is taking a long time to shutdown or startup as you can 
just fire a new terminal and tail the master log (and maybe a few 
RegionServers).
-        </para>
-      </section>
-      <section xml:id="trouble.tools.top">
-        <title>top</title>
-        <para>
-        <code>top</code> is probably one of the most important tool when first 
trying to see what’s running on a machine and how the resources are consumed. 
Here’s an example from production system:
-        <programlisting>
-top - 14:46:59 up 39 days, 11:55,  1 user,  load average: 3.75, 3.57, 3.84
-Tasks: 309 total,   1 running, 308 sleeping,   0 stopped,   0 zombie
-Cpu(s):  4.5%us,  1.6%sy,  0.0%ni, 91.7%id,  1.4%wa,  0.1%hi,  0.6%si,  0.0%st
-Mem:  24414432k total, 24296956k used,   117476k free,     7196k buffers
-Swap: 16008732k total, 14348k used, 15994384k free, 11106908k cached
-
-  PID USER     PR  NI  VIRT  RES  SHR S %CPU %MEM      TIME+  COMMAND
-15558 hadoop   18  -2 3292m 2.4g 3556 S   79 10.4   6523:52 java
-13268 hadoop   18  -2 8967m 8.2g 4104 S   21 35.1   5170:30 java
- 8895 hadoop   18  -2 1581m 497m 3420 S   11  2.1   4002:32 java
-…
-        </programlisting>
-        </para>
-        <para>
-        Here we can see that the system load average during the last five 
minutes is 3.75, which very roughly means that on average 3.75 threads were 
waiting for CPU time during these 5 minutes.  In general, the “perfect” 
utilization equals to the number of cores, under that number the machine is 
under utilized and over that the machine is over utilized.  This is an 
important concept, see this article to understand it more: <link 
xlink:href="http://www.linuxjournal.com/article/9001";>http://www.linuxjournal.com/article/9001</link>.
-        </para>
-        <para>
-        Apart from load, we can see that the system is using almost all its 
available RAM but most of it is used for the OS cache (which is good). The swap 
only has a few KBs in it and this is wanted, high numbers would indicate 
swapping activity which is the nemesis of performance of Java systems. Another 
way to detect swapping is when the load average goes through the roof (although 
this could also be caused by things like a dying disk, among others).
-        </para>
-        <para>
-        The list of processes isn’t super useful by default, all we know is 
that 3 java processes are using about 111% of the CPUs. To know which is which, 
simply type “c” and each line will be expanded. Typing “1” will give 
you the detail of how each CPU is used instead of the average for all of them 
like shown here.
-        </para>
-      </section>
-      <section xml:id="trouble.tools.jps">
-        <title>jps</title>
-        <para>
-        <code>jps</code> is shipped with every JDK and gives the java process 
ids for the current user (if root, then it gives the ids for all users). 
Example:
-        <programlisting>
-hadoop@sv4borg12:~$ jps
-1322 TaskTracker
-17789 HRegionServer
-27862 Child
-1158 DataNode
-25115 HQuorumPeer
-2950 Jps
-19750 ThriftServer
-18776 jmx
-        </programlisting>
-        In order, we see a:
-        <itemizedlist>
-          <listitem>Hadoop TaskTracker, manages the local Childs</listitem>
-          <listitem>HBase RegionServer, serves regions</listitem>
-          <listitem>Child, its MapReduce task, cannot tell which type 
exactly</listitem>
-          <listitem>Hadoop TaskTracker, manages the local Childs</listitem>
-          <listitem>Hadoop DataNode, serves blocks</listitem>
-          <listitem>HQuorumPeer, a ZooKeeper ensemble member</listitem>
-          <listitem>Jps, well… it’s the current process</listitem>
-          <listitem>ThriftServer, it’s a special one will be running only if 
thrift was started</listitem>
-          <listitem>jmx, this is a local process that’s part of our 
monitoring platform ( poorly named maybe). You probably don’t have 
that.</listitem>
-        </itemizedlist>
-        </para>
-        <para>
-      You can then do stuff like checking out the full command line that 
started the process:
-        <programlisting>
-hadoop@sv4borg12:~$ ps aux | grep HRegionServer
-hadoop   17789  155 35.2 9067824 8604364 ?     S&lt;l  Mar04 9855:48 
/usr/java/jdk1.6.0_14/bin/java -Xmx8000m -XX:+DoEscapeAnalysis 
-XX:+AggressiveOpts -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m 
-XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -Xloggc:/export1/hadoop/logs/gc-hbase.log 
-Dcom.sun.management.jmxremote.port=10102 
-Dcom.sun.management.jmxremote.authenticate=true 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.password.file=/home/hadoop/hbase/conf/jmxremote.password
 -Dcom.sun.management.jmxremote -Dhbase.log.dir=/export1/hadoop/logs 
-Dhbase.log.file=hbase-hadoop-regionserver-sv4borg12.log 
-Dhbase.home.dir=/home/hadoop/hbase -Dhbase.id.str=hadoop 
-Dhbase.root.logger=INFO,DRFA 
-Djava.library.path=/home/hadoop/hbase/lib/native/Linux-amd64-64 -classpath 
/home/hadoop/hbase/bin/../conf:[many jars]:/home/hadoop/hadoop/conf 
org.apache.hadoop.hbase.regionserver.HRegionServer start
-        </programlisting>
-        </para>
-      </section>
-      <section xml:id="trouble.tools.jstack">
-        <title>jstack</title>
-        <para>
-        <code>jstack</code> is one of the most important tools when trying to 
figure out what a java process is doing apart from looking at the logs. It has 
to be used in conjunction with jps in order to give it a process id. It shows a 
list of threads, each one has a name, and they appear in the order that they 
were created (so the top ones are the most recent threads). Here’s a few 
example:
-        </para>
-        <para>
-        The main thread of a RegionServer that’s waiting for something to do 
from the master:
-        <programlisting>
-      "regionserver60020" prio=10 tid=0x0000000040ab4000 nid=0x45cf waiting on 
condition [0x00007f16b6a96000..0x00007f16b6a96a70]
-   java.lang.Thread.State: TIMED_WAITING (parking)
-               at sun.misc.Unsafe.park(Native Method)
-               - parking to wait for  &lt;0x00007f16cd5c2f30&gt; (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
-               at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
-               at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
-               at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:395)
-               at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:647)
-               at java.lang.Thread.run(Thread.java:619)
-
-               The MemStore flusher thread that is currently flushing to a 
file:
-"regionserver60020.cacheFlusher" daemon prio=10 tid=0x0000000040f4e000 
nid=0x45eb in Object.wait() [0x00007f16b5b86000..0x00007f16b5b87af0]
-   java.lang.Thread.State: WAITING (on object monitor)
-               at java.lang.Object.wait(Native Method)
-               at java.lang.Object.wait(Object.java:485)
-               at org.apache.hadoop.ipc.Client.call(Client.java:803)
-               - locked &lt;0x00007f16cb14b3a8&gt; (a 
org.apache.hadoop.ipc.Client$Call)
-               at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
-               at $Proxy1.complete(Unknown Source)
-               at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
-               at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
-               at java.lang.reflect.Method.invoke(Method.java:597)
-               at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
-               at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
-               at $Proxy1.complete(Unknown Source)
-               at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3390)
-               - locked &lt;0x00007f16cb14b470&gt; (a 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
-               at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3304)
-               at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
-               at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
-               at 
org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:650)
-               at 
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:853)
-               at 
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:467)
-               - locked &lt;0x00007f16d00e6f08&gt; (a java.lang.Object)
-               at 
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:427)
-               at 
org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:80)
-               at 
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1359)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:907)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:834)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:786)
-               at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:250)
-               at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:224)
-               at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
-        </programlisting>
-        </para>
-        <para>
-               A handler thread that’s waiting for stuff to do (like put, 
delete, scan, etc):
-        <programlisting>
-"IPC Server handler 16 on 60020" daemon prio=10 tid=0x00007f16b011d800 
nid=0x4a5e waiting on condition [0x00007f16afefd000..0x00007f16afefd9f0]
-   java.lang.Thread.State: WAITING (parking)
-               at sun.misc.Unsafe.park(Native Method)
-               - parking to wait for  &lt;0x00007f16cd3f8dd8&gt; (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
-               at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
-               at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
-               at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
-               at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1013)
-        </programlisting>
-        </para>
-        <para>
-               And one that’s busy doing an increment of a counter (it’s 
in the phase where it’s trying to create a scanner in order to read the last 
value):
-        <programlisting>
-"IPC Server handler 66 on 60020" daemon prio=10 tid=0x00007f16b006e800 
nid=0x4a90 runnable [0x00007f16acb77000..0x00007f16acb77cf0]
-   java.lang.Thread.State: RUNNABLE
-               at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.&lt;init&gt;(KeyValueHeap.java:56)
-               at 
org.apache.hadoop.hbase.regionserver.StoreScanner.&lt;init&gt;(StoreScanner.java:79)
-               at 
org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1202)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.&lt;init&gt;(HRegion.java:2209)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1063)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1055)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1039)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.getLastIncrement(HRegion.java:2875)
-               at 
org.apache.hadoop.hbase.regionserver.HRegion.incrementColumnValue(HRegion.java:2978)
-               at 
org.apache.hadoop.hbase.regionserver.HRegionServer.incrementColumnValue(HRegionServer.java:2433)
-               at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
-               at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
-               at java.lang.reflect.Method.invoke(Method.java:597)
-               at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:560)
-               at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1027)
-        </programlisting>
-        </para>
-        <para>
-               A thread that receives data from HDFS:
-        <programlisting>
-"IPC Client (47) connection to sv4borg9/10.4.24.40:9000 from hadoop" daemon 
prio=10 tid=0x00007f16a02d0000 nid=0x4fa3 runnable 
[0x00007f16b517d000..0x00007f16b517dbf0]
-   java.lang.Thread.State: RUNNABLE
-               at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
-               at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
-               at 
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
-               at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
-               - locked &lt;0x00007f17d5b68c00&gt; (a sun.nio.ch.Util$1)
-               - locked &lt;0x00007f17d5b68be8&gt; (a 
java.util.Collections$UnmodifiableSet)
-               - locked &lt;0x00007f1877959b50&gt; (a 
sun.nio.ch.EPollSelectorImpl)
-               at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
-               at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
-               at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
-               at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
-               at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
-               at java.io.FilterInputStream.read(FilterInputStream.java:116)
-               at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:304)
-               at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
-               at 
java.io.BufferedInputStream.read(BufferedInputStream.java:237)
-               - locked &lt;0x00007f1808539178&gt; (a 
java.io.BufferedInputStream)
-               at java.io.DataInputStream.readInt(DataInputStream.java:370)
-               at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:569)
-               at org.apache.hadoop.ipc.Client$Connection.run(Client.java:477)
-          </programlisting>
-          </para>
-          <para>
-               And here is a master trying to recover a lease after a 
RegionServer died:
-          <programlisting>
-"LeaseChecker" daemon prio=10 tid=0x00000000407ef800 nid=0x76cd waiting on 
condition [0x00007f6d0eae2000..0x00007f6d0eae2a70]
---
-   java.lang.Thread.State: WAITING (on object monitor)
-               at java.lang.Object.wait(Native Method)
-               at java.lang.Object.wait(Object.java:485)
-               at org.apache.hadoop.ipc.Client.call(Client.java:726)
-               - locked &lt;0x00007f6d1cd28f80&gt; (a 
org.apache.hadoop.ipc.Client$Call)
-               at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
-               at $Proxy1.recoverBlock(Unknown Source)
-               at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2636)
-               at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.&lt;init&gt;(DFSClient.java:2832)
-               at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:529)
-               at 
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:186)
-               at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:530)
-               at 
org.apache.hadoop.hbase.util.FSUtils.recoverFileLease(FSUtils.java:619)
-               at 
org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1322)
-               at 
org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210)
-               at 
org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:648)
-               at 
org.apache.hadoop.hbase.master.HMaster.joinCluster(HMaster.java:572)
-               at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:503)
-          </programlisting>
-          </para>
-        </section>
-        <section xml:id="trouble.tools.opentsdb">
-          <title>OpenTSDB</title>
-          <para>
-          <link xlink:href="http://opentsdb.net";>OpenTSDB</link> is an 
excellent alternative to Ganglia as it uses Apache HBase to store all the time 
series and doesn’t have to downsample. Monitoring your own HBase cluster that 
hosts OpenTSDB is a good exercise.
-          </para>
-          <para>
-          Here’s an example of a cluster that’s suffering from hundreds of 
compactions launched almost all around the same time, which severely affects 
the IO performance:  (TODO:  insert graph plotting compactionQueueSize)
-          </para>
-          <para>
-          It’s a good practice to build dashboards with all the important 
graphs per machine and per cluster so that debugging issues can be done with a 
single quick look. For example, at StumbleUpon there’s one dashboard per 
cluster with the most important metrics from both the OS and Apache HBase. You 
can then go down at the machine level and get even more detailed metrics.
-          </para>
-       </section>
-       <section xml:id="trouble.tools.clustersshtop">
-        <title>clusterssh+top</title>
-         <para>
-          clusterssh+top, it’s like a poor man’s monitoring system and it 
can be quite useful when you have only a few machines as it’s very easy to 
setup. Starting clusterssh will give you one terminal per machine and another 
terminal in which whatever you type will be retyped in every window. This means 
that you can type “top” once and it will start it for all of your machines 
at the same time giving you full view of the current state of your cluster. You 
can also tail all the logs at the same time, edit files, etc.
-          </para>
-       </section>
-    </section>
-    </section>
-
-    <section xml:id="trouble.client">
-      <title>Client</title>
-       <para>For more information on the HBase client, see <xref 
linkend="client"/>.
-       </para>
-       <section xml:id="trouble.client.scantimeout">
-            <title>ScannerTimeoutException or UnknownScannerException</title>
-            <para>This is thrown if the time between RPC calls from the client 
to RegionServer exceeds the scan timeout.
-            For example, if <code>Scan.setCaching</code> is set to 500, then 
there will be an RPC call to fetch the next batch of rows every 500 
<code>.next()</code> calls on the ResultScanner
-            because data is being transferred in blocks of 500 rows to the 
client.  Reducing the setCaching value may be an option, but setting this value 
too low makes for inefficient
-            processing on numbers of rows.
-            </para>
-            <para>See <xref linkend="perf.hbase.client.caching"/>.
-            </para>
-       </section>
-       <section xml:id="trouble.client.lease.exception">
-            <title><classname>LeaseException</classname> when calling 
<classname>Scanner.next</classname></title>
-            <para>
-In some situations clients that fetch data from a RegionServer get a 
LeaseException instead of the usual
-<xref linkend="trouble.client.scantimeout" />.  Usually the source of the 
exception is
-<classname>org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)</classname>
 (line number may vary).
-It tends to happen in the context of a slow/freezing RegionServer#next call.
-It can be prevented by having <varname>hbase.rpc.timeout</varname> > 
<varname>hbase.regionserver.lease.period</varname>.
-Harsh J investigated the issue as part of the mailing list thread
-<link 
xlink:href="http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E";>HBase,
 mail # user - Lease does not exist exceptions</link>
-            </para>
-       </section>
-       <section xml:id="trouble.client.scarylogs">
-            <title>Shell or client application throws lots of scary exceptions 
during normal operation</title>
-            <para>Since 0.20.0 the default log level for 
<code>org.apache.hadoop.hbase.*</code>is DEBUG. </para>
-            <para>
-            On your clients, edit 
<filename>$HBASE_HOME/conf/log4j.properties</filename> and change this: 
<code>log4j.logger.org.apache.hadoop.hbase=DEBUG</code> to this: 
<code>log4j.logger.org.apache.hadoop.hbase=INFO</code>, or even 
<code>log4j.logger.org.apache.hadoop.hbase=WARN</code>.
-            </para>
-       </section>
-       <section xml:id="trouble.client.longpauseswithcompression">
-            <title>Long Client Pauses With Compression</title>
-            <para>This is a fairly frequent question on the Apache HBase 
dist-list.  The scenario is that a client is typically inserting a lot of data 
into a
-            relatively un-optimized HBase cluster.  Compression can exacerbate 
the pauses, although it is not the source of the problem.</para>
-            <para>See <xref linkend="precreate.regions"/> on the pattern for 
pre-creating regions and confirm that the table isn't starting with a single 
region.</para>
-            <para>See <xref linkend="perf.configurations"/> for cluster 
configuration, particularly <code>hbase.hstore.blockingStoreFiles</code>, 
<code>hbase.hregion.memstore.block.multiplier</code>,
-            <code>MAX_FILESIZE</code> (region size), and 
<code>MEMSTORE_FLUSHSIZE.</code>  </para>
-            <para>A slightly longer explanation of why pauses can happen is as 
follows:  Puts are sometimes blocked on the MemStores which are blocked by the 
flusher thread which is blocked because there are
-            too many files to compact because the compactor is given too many 
small files to compact and has to compact the same data repeatedly.  This 
situation can occur even with minor compactions.
-            Compounding this situation, Apache HBase doesn't compress data in 
memory.  Thus, the 64MB that lives in the MemStore could become a 6MB file 
after compression - which results in a smaller StoreFile.  The upside is that
-            more data is packed into the same region, but performance is 
achieved by being able to write larger files - which is why HBase waits until 
the flushize before writing a new StoreFile.  And smaller StoreFiles
-            become targets for compaction.  Without compression the files are 
much bigger and don't need as much compaction, however this is at the expense 
of I/O.
-            </para>
-            <para>
-            For additional information, see this thread on <link 
xlink:href="http://search-hadoop.com/m/WUnLM6ojHm1/Long+client+pauses+with+compression&amp;subj=Long+client+pauses+with+compression";>Long
 client pauses with compression</link>.
-            </para>
-
-       </section>
-       <section xml:id="trouble.client.zookeeper">
-            <title>ZooKeeper Client Connection Errors</title>
-            <para>Errors like this...
-<programlisting>
-11/07/05 11:26:41 WARN zookeeper.ClientCnxn: Session 0x0 for server null,
- unexpected error, closing socket connection and attempting reconnect
- java.net.ConnectException: Connection refused: no further information
-        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
-        at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
-        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
- 11/07/05 11:26:43 INFO zookeeper.ClientCnxn: Opening socket connection to
- server localhost/127.0.0.1:2181
- 11/07/05 11:26:44 WARN zookeeper.ClientCnxn: Session 0x0 for server null,
- unexpected error, closing socket connection and attempting reconnect
- java.net.ConnectException: Connection refused: no further information
-        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
-        at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
-        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
- 11/07/05 11:26:45 INFO zookeeper.ClientCnxn: Opening socket connection to
- server localhost/127.0.0.1:2181
-</programlisting>
-            ... are either due to ZooKeeper being down, or unreachable due to 
network issues.
-            </para>
-            <para>The utility <xref linkend="trouble.tools.builtin.zkcli"/> 
may help investigate ZooKeeper issues.
-            </para>
-       </section>
-       <section xml:id="trouble.client.oome.directmemory.leak">
-            <title>Client running out of memory though heap size seems to be 
stable (but the off-heap/direct heap keeps growing)</title>
-            <para>
-You are likely running into the issue that is described and worked through in
-the mail thread <link 
xhref="http://search-hadoop.com/m/ubhrX8KvcH/Suspected+memory+leak&amp;subj=Re+Suspected+memory+leak";>HBase,
 mail # user - Suspected memory leak</link>
-and continued over in <link 
xhref="http://search-hadoop.com/m/p2Agc1Zy7Va/MaxDirectMemorySize+Was%253A+Suspected+memory+leak&amp;subj=Re+FeedbackRe+Suspected+memory+leak";>HBase,
 mail # dev - FeedbackRe: Suspected memory leak</link>.
-A workaround is passing your client-side JVM a reasonable value for 
<code>-XX:MaxDirectMemorySize</code>.  By default,
-the <varname>MaxDirectMemorySize</varname> is equal to your <code>-Xmx</code> 
max heapsize setting (if <code>-Xmx</code> is set).
-Try seting it to something smaller (for example, one user had success setting 
it to <code>1g</code> when
-they had a client-side heap of <code>12g</code>).  If you set it too small, it 
will bring on <code>FullGCs</code> so keep
-it  a bit hefty.  You want to make this setting client-side only especially if 
you are running the new experiemental
-server-side off-heap cache since this feature depends on being able to use big 
direct buffers (You may have to keep
-separate client-side and server-side config dirs).
-            </para>
-       </section>
-       <section xml:id="trouble.client.slowdown.admin">
-            <title>Client Slowdown When Calling Admin Methods (flush, compact, 
etc.)</title>
-            <para>
-This is a client issue fixed by <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-5073";>HBASE-5073</link> 
in 0.90.6.
-There was a ZooKeeper leak in the client and the client was getting pummeled 
by ZooKeeper events with each additional
-invocation of the admin API.
-            </para>
-       </section>
-
-       <section xml:id="trouble.client.security.rpc">
-           <title>Secure Client Cannot Connect ([Caused by GSSException: No 
valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)])</title>
-           <para>
-There can be several causes that produce this symptom.
-           </para>
-           <para>
-First, check that you have a valid Kerberos ticket. One is required in order 
to set up communication with a secure Apache HBase cluster. Examine the ticket 
currently in the credential cache, if any, by running the klist command line 
utility. If no ticket is listed, you must obtain a ticket by running the kinit 
command with either a keytab specified, or by interactively entering a password 
for the desired principal.
-           </para>
-           <para>
-Then, consult the <link 
xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/security/jgss/tutorials/Troubleshooting.html";>Java
 Security Guide troubleshooting section</link>. The most common problem 
addressed there is resolved by setting javax.security.auth.useSubjectCredsOnly 
system property value to false.
-           </para>
-           <para>
-Because of a change in the format in which MIT Kerberos writes its credentials 
cache, there is a bug in the Oracle JDK 6 Update 26 and earlier that causes 
Java to be unable to read the Kerberos credentials cache created by versions of 
MIT Kerberos 1.8.1 or higher. If you have this problematic combination of 
components in your environment, to work around this problem, first log in with 
kinit and then immediately refresh the credential cache with kinit -R. The 
refresh will rewrite the credential cache without the problematic formatting.
-           </para>
-           <para>
-Finally, depending on your Kerberos configuration, you may need to install the 
<link 
xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jce/JCERefGuide.html";>Java
 Cryptography Extension</link>, or JCE. Insure the JCE jars are on the 
classpath on both server and client systems.
-           </para>
-           <para>
-You may also need to download the <link 
xlink:href="http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html";>unlimited
 strength JCE policy files</link>. Uncompress and extract the downloaded file, 
and install the policy jars into &lt;java-home&gt;/lib/security.
-           </para>
-       </section>
-
-    </section>
-
-    <section xml:id="trouble.mapreduce">
-      <title>MapReduce</title>
-      <section xml:id="trouble.mapreduce.local">
-        <title>You Think You're On The Cluster, But You're Actually 
Local</title>
-        <para>This following stacktrace happened using <code>ImportTsv</code>, 
but things like this
-        can happen on any job with a mis-configuration.
-<programlisting>
-    WARN mapred.LocalJobRunner: job_local_0001
-java.lang.IllegalArgumentException: Can't read partitions file
-       at 
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
-       at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
-       at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
-       at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.&lt;init&gt;(MapTask.java:560)
-       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
-       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
-       at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
-Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
-       at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383)
-       at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
-       at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:776)
-       at 
org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1424)
-       at 
org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1419)
-       at 
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:296)
-</programlisting>
-      .. see the critical portion of the stack?  It's...
-<programlisting>
-       at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
-</programlisting>
-       LocalJobRunner means the job is running locally, not on the cluster.
-      </para>
-
-      <para>To solve this problem, you should run your MR job with your
-      <code>HADOOP_CLASSPATH</code> set to include the HBase dependencies.
-      The "hbase classpath" utility can be used to do this easily.
-      For example (substitute VERSION with your HBase version):
-      <programlisting>
-          HADOOP_CLASSPATH=`hbase classpath` hadoop jar 
$HBASE_HOME/hbase-VERSION.jar rowcounter usertable
-      </programlisting>
-      </para>
-      <para>See
-      <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath";>
-      
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath</link>
 for more
-      information on HBase MapReduce jobs and classpaths.
-      </para>
-      </section>
-    </section>
-
-    <section xml:id="trouble.namenode">
-      <title>NameNode</title>
-       <para>For more information on the NameNode, see <xref 
linkend="arch.hdfs"/>.
-       </para>
-       <section xml:id="trouble.namenode.disk">
-            <title>HDFS Utilization of Tables and Regions</title>
-            <para>To determine how much space HBase is using on HDFS use the 
<code>hadoop</code> shell commands from the NameNode.  For example... </para>
-            <para><programlisting>hadoop fs -dus /hbase/</programlisting> 
...returns the summarized disk utilization for all HBase objects.  </para>
-            <para><programlisting>hadoop fs -dus 
/hbase/myTable</programlisting> ...returns the summarized disk utilization for 
the HBase table 'myTable'. </para>
-            <para><programlisting>hadoop fs -du 
/hbase/myTable</programlisting> ...returns a list of the regions under the 
HBase table 'myTable' and their disk utilization. </para>
-            <para>For more information on HDFS shell commands, see the <link 
xlink:href="http://hadoop.apache.org/common/docs/current/file_system_shell.html";>HDFS
 FileSystem Shell documentation</link>.
-            </para>
-       </section>
-       <section xml:id="trouble.namenode.hbase.objects">
-            <title>Browsing HDFS for HBase Objects</title>
-            <para>Somtimes it will be necessary to explore the HBase objects 
that exist on HDFS.  These objects could include the WALs (Write Ahead Logs), 
tables, regions, StoreFiles, etc.
-            The easiest way to do this is with the NameNode web application 
that runs on port 50070.  The NameNode web application will provide links to 
the all the DataNodes in the cluster so that
-            they can be browsed seamlessly. </para>
-            <para>The HDFS directory structure of HBase tables in the cluster 
is...
-            <programlisting>
-<filename>/hbase</filename>
-     <filename>/&lt;Table&gt;</filename>             (Tables in the cluster)
-          <filename>/&lt;Region&gt;</filename>           (Regions for the 
table)
-               <filename>/&lt;ColumnFamiy&gt;</filename>      (ColumnFamilies 
for the Region for the table)
-                    <filename>/&lt;StoreFile&gt;</filename>        (StoreFiles 
for the ColumnFamily for the Regions for the table)
-            </programlisting>
-            </para>
-            <para>The HDFS directory structure of HBase WAL is..
-            <programlisting>
-<filename>/hbase</filename>
-     <filename>/.logs</filename>
-          <filename>/&lt;RegionServer&gt;</filename>    (RegionServers)
-               <filename>/&lt;HLog&gt;</filename>           (WAL HLog files 
for the RegionServer)
-            </programlisting>
-            </para>
-                   <para>See the <link xlink:href="see 
http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html";>HDFS User 
Guide</link> for other non-shell diagnostic
-                   utilities like <code>fsck</code>.
-            </para>
-          <section xml:id="trouble.namenode.0size.hlogs">
-            <title>Zero size HLogs with data in them</title>
-              <para>Problem: when getting a listing of all the files in a 
region server's .logs directory, one file has a size of 0 but it contains 
data.</para>
-              <para>Answer: It's an HDFS quirk. A file that's currently being 
to will appear to have a size of 0 but once it's closed it will show its true 
size</para>
-          </section>
-          <section xml:id="trouble.namenode.uncompaction">
-            <title>Use Cases</title>
-              <para>Two common use-cases for querying HDFS for HBase objects 
is research the degree of uncompaction of a table.  If there are a large number 
of StoreFiles for each ColumnFamily it could
-              indicate the need for a major compaction.  Additionally, after a 
major compaction if the resulting StoreFile is "small" it could indicate the 
need for a reduction of ColumnFamilies for
-              the table.
-                   </para>
-                 </section>
-
-       </section>
-     </section>
-
-    <section xml:id="trouble.network">
-      <title>Network</title>
-      <section xml:id="trouble.network.spikes">
-        <title>Network Spikes</title>
-        <para>If you are seeing periodic network spikes you might want to 
check the <code>compactionQueues</code> to see if major
-        compactions are happening.
-        </para>
-        <para>See <xref linkend="managed.compactions"/> for more information 
on managing compactions.
-        </para>
-      </section>
-      <section xml:id="trouble.network.loopback">
-        <title>Loopback IP</title>
-        <para>HBase expects the loopback IP Address to be 127.0.0.1.  See the 
Getting Started section on <xref linkend="loopback.ip" />.
-        </para>
-       </section>
-      <section xml:id="trouble.network.ints">
-        <title>Network Interfaces</title>
-        <para>Are all the network interfaces functioning correctly?  Are you 
sure?  See the Troubleshooting Case Study in <xref 
linkend="trouble.casestudy"/>.
-        </para>
-      </section>
-
-    </section>
-
-    <section xml:id="trouble.rs">
-      <title>RegionServer</title>
-        <para>For more information on the RegionServers, see <xref 
linkend="regionserver.arch"/>.
-       </para>
-      <section xml:id="trouble.rs.startup">
-        <title>Startup Errors</title>
-          <section xml:id="trouble.rs.startup.master-no-region">
-            <title>Master Starts, But RegionServers Do Not</title>
-            <para>The Master believes the RegionServers have the IP of 
127.0.0.1 - which is localhost and resolves to the master's own localhost.
-            </para>
-            <para>The RegionServers are erroneously informing the Master that 
their IP addresses are 127.0.0.1.
-            </para>
-            <para>Modify <filename>/etc/hosts</filename> on the region 
servers, from...
-            <programlisting>
-# Do not remove the following line, or various programs
-# that require network functionality will fail.
-127.0.0.1               fully.qualified.regionservername regionservername  
localhost.localdomain localhost
-::1             localhost6.localdomain6 localhost6
-            </programlisting>
-            ... to (removing the master node's name from localhost)...
-            <programlisting>
-# Do not remove the following line, or various programs
-# that require network functionality will fail.
-127.0.0.1               localhost.localdomain localhost
-::1             localhost6.localdomain6 localhost6
-            </programlisting>
-            </para>
-          </section>
-
-          <section xml:id="trouble.rs.startup.compression">
-            <title>Compression Link Errors</title>
-            <para>
-            Since compression algorithms such as LZO need to be installed and 
configured on each cluster this is a frequent source of startup error.  If you 
see messages like this...
-            <programlisting>
-11/02/20 01:32:15 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl 
library
-java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
-        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
-        at java.lang.Runtime.loadLibrary0(Runtime.java:823)
-        at java.lang.System.loadLibrary(System.java:1028)
-            </programlisting>
-            .. then there is a path issue with the compression libraries.  See 
the Configuration section on <link linkend="lzo.compression">LZO compression 
configuration</link>.
-            </para>
-          </section>
-      </section>
-      <section xml:id="trouble.rs.runtime">
-        <title>Runtime Errors</title>
-
-        <section xml:id="trouble.rs.runtime.hang">
-           <title>RegionServer Hanging</title>
-           <para>
-            Are you running an old JVM (&lt; 1.6.0_u21?)?  When you look at a 
thread dump,
-            does it look like threads are BLOCKED but no one holds the lock 
all are
-            blocked on?  See <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-3622";>HBASE 3622 
Deadlock in HBaseServer (JVM bug?)</link>.
-            Adding <code>-XX:+UseMembar</code> to the HBase 
<varname>HBASE_OPTS</varname> in <filename>conf/hbase-env.sh</filename>
-            may fix it.
-            </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.filehandles">
-           <title>java.io.IOException...(Too many open files)</title>
-           <para>
-           If you see log messages like this...
-<programlisting>
-2010-09-13 01:24:17,336 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
-Disk-related IOException in BlockReceiver constructor. Cause is 
java.io.IOException: Too many open files
-        at java.io.UnixFileSystem.createFileExclusively(Native Method)
-        at java.io.File.createNewFile(File.java:883)
-</programlisting>
-           ... see the Getting Started section on <link 
linkend="ulimit">ulimit and nproc configuration</link>.
-           </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.xceivers">
-           <title>xceiverCount 258 exceeds the limit of concurrent xcievers 
256</title>
-           <para>
-           This typically shows up in the DataNode logs.
-           </para>
-           <para>
-           See the Getting Started section on <link 
linkend="dfs.datanode.max.xcievers">xceivers configuration</link>.
-           </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.oom-nt">
-           <title>System instability, and the presence of 
"java.lang.OutOfMemoryError: unable to create new native thread in exceptions" 
HDFS DataNode logs or that of any system daemon</title>
-           <para>
-           See the Getting Started section on <link linkend="ulimit">ulimit 
and nproc configuration</link>.  The default on recent Linux
-           distributions is 1024 - which is far too low for HBase.
-           </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.gc">
-           <title>DFS instability and/or RegionServer lease timeouts</title>
-           <para>
-           If you see warning messages like this...
-           <programlisting>
-2009-02-24 10:01:33,516 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
xxx ms, ten times longer than scheduled: 10000
-2009-02-24 10:01:33,516 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
xxx ms, ten times longer than scheduled: 15000
-2009-02-24 10:01:36,472 WARN 
org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to master 
for xxx milliseconds - retrying
-           </programlisting>
-           ... or see full GC compactions then you may be experiencing full 
GC's.
-           </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.nolivenodes">
-           <title>"No live nodes contain current block" and/or 
YouAreDeadException</title>
-           <para>
-           These errors can happen either when running out of OS file handles 
or in periods of severe network problems where the nodes are unreachable.
-           </para>
-           <para>
-           See the Getting Started section on <link linkend="ulimit">ulimit 
and nproc configuration</link> and check your network.
-           </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.zkexpired">
-           <title>ZooKeeper SessionExpired events</title>
-           <para>Master or RegionServers shutting down with messages like 
those in the logs: </para>
-           <programlisting>
-WARN org.apache.zookeeper.ClientCnxn: Exception
-closing session 0x278bd16a96000f to sun.nio.ch.SelectionKeyImpl@355811ec
-java.io.IOException: TIMED OUT
-       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
-WARN org.apache.hadoop.hbase.util.Sleeper: We slept 79410ms, ten times longer 
than scheduled: 5000
-INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server 
hostname/IP:PORT
-INFO org.apache.zookeeper.ClientCnxn: Priming connection to 
java.nio.channels.SocketChannel[connected local=/IP:PORT 
remote=hostname/IP:PORT]
-INFO org.apache.zookeeper.ClientCnxn: Server connection successful
-WARN org.apache.zookeeper.ClientCnxn: Exception closing session 
0x278bd16a96000d to sun.nio.ch.SelectionKeyImpl@3544d65e
-java.io.IOException: Session Expired
-       at 
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
-       at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
-       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
-ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session 
expired
-           </programlisting>
-           <para>
-           The JVM is doing a long running garbage collecting which is pausing 
every threads (aka "stop the world").
-           Since the RegionServer's local ZooKeeper client cannot send 
heartbeats, the session times out.
-           By design, we shut down any node that isn't able to contact the 
ZooKeeper ensemble after getting a timeout so that it stops serving data that 
may already be assigned elsewhere.
-           </para>
-           <para>
-            <itemizedlist>
-              <listitem>Make sure you give plenty of RAM (in 
<filename>hbase-env.sh</filename>), the default of 1GB won't be able to sustain 
long running imports.</listitem>
-              <listitem>Make sure you don't swap, the JVM never behaves well 
under swapping.</listitem>
-              <listitem>Make sure you are not CPU starving the RegionServer 
thread. For example, if you are running a MapReduce job using 6 CPU-intensive 
tasks on a machine with 4 cores, you are probably starving the RegionServer 
enough to create longer garbage collection pauses.</listitem>
-              <listitem>Increase the ZooKeeper session timeout</listitem>
-           </itemizedlist>
-           If you wish to increase the session timeout, add the following to 
your <filename>hbase-site.xml</filename> to increase the timeout from the 
default of 60 seconds to 120 seconds.
-           <programlisting>
-&lt;property&gt;
-    &lt;name&gt;zookeeper.session.timeout&lt;/name&gt;
-    &lt;value&gt;1200000&lt;/value&gt;
-&lt;/property&gt;
-&lt;property&gt;
-    &lt;name&gt;hbase.zookeeper.property.tickTime&lt;/name&gt;
-    &lt;value&gt;6000&lt;/value&gt;
-&lt;/property&gt;
-            </programlisting>
-           </para>
-           <para>
-           Be aware that setting a higher timeout means that the regions 
served by a failed RegionServer will take at least
-           that amount of time to be transfered to another RegionServer. For a 
production system serving live requests, we would instead
-           recommend setting it lower than 1 minute and over-provision your 
cluster in order the lower the memory load on each machines (hence having
-           less garbage to collect per machine).
-           </para>
-           <para>
-           If this is happening during an upload which only happens once (like 
initially loading all your data into HBase), consider bulk loading.
-           </para>
-           See <xref linkend="trouble.zookeeper.general"/> for other general 
information about ZooKeeper troubleshooting.
-        </section>
-        <section xml:id="trouble.rs.runtime.notservingregion">
-           <title>NotServingRegionException</title>
-           <para>This exception is "normal" when found in the RegionServer 
logs at DEBUG level.  This exception is returned back to the client
-           and then the client goes back to .META. to find the new location of 
the moved region.</para>
-           <para>However, if the NotServingRegionException is logged ERROR, 
then the client ran out of retries and something probably wrong.</para>
-        </section>
-        <section xml:id="trouble.rs.runtime.double_listed_regions">
-           <title>Regions listed by domain name, then IP</title>
-           <para>
-           Fix your DNS.  In versions of Apache HBase before 0.92.x, reverse 
DNS needs to give same answer
-           as forward lookup. See <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-3431";>HBASE 3431
-           RegionServer is not using the name given it by the master; double 
entry in master listing of servers</link> for gorey details.
-          </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.codecmsgs">
-          <title>Logs flooded with '2011-01-10 12:40:48,407 INFO 
org.apache.hadoop.io.compress.CodecPool: Got
-            brand-new compressor' messages</title>
-                <para>We are not using the native versions of compression
-                    libraries.  See <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-1900";>HBASE-1900 Put 
back native support when hadoop 0.21 is released</link>.
-                    Copy the native libs from hadoop under hbase lib dir or
-                    symlink them into place and the message should go away.
-                </para>
-        </section>
-        <section xml:id="trouble.rs.runtime.client_went_away">
-           <title>Server handler X on 60020 caught: 
java.nio.channels.ClosedChannelException</title>
-           <para>
-           If you see this type of message it means that the region server was 
trying to read/send data from/to a client but
-           it already went away. Typical causes for this are if the client was 
killed (you see a storm of messages like this when a MapReduce
-           job is killed or fails) or if the client receives a 
SocketTimeoutException. It's harmless, but you should consider digging in
-           a bit more if you aren't doing something to trigger them.
-           </para>
-        </section>
-
-      </section>
-      <section xml:id="trouble.rs.shutdown">
-        <title>Shutdown Errors</title>
-
-      </section>
-
-    </section>
-
-    <section xml:id="trouble.master">
-      <title>Master</title>
-       <para>For more information on the Master, see <xref linkend="master"/>.
-       </para>
-      <section xml:id="trouble.master.startup">
-        <title>Startup Errors</title>
-          <section xml:id="trouble.master.startup.migration">
-             <title>Master says that you need to run the hbase migrations 
script</title>
-             <para>Upon running that, the hbase migrations script says no 
files in root directory.</para>
-             <para>HBase expects the root directory to either not exist, or to 
have already been initialized by hbase running a previous time. If you create a 
new directory for HBase using Hadoop DFS, this error will occur.
-             Make sure the HBase root directory does not currently exist or 
has been initialized by a previous run of HBase. Sure fire solution is to just 
use Hadoop dfs to delete the HBase root and let HBase create and initialize the 
directory itself.
-             </para>
-          </section>
-          <section xml:id="trouble.master.startup.zk.buffer">
-              <title>Packet len6080218 is out of range!</title>
-              <para>If you have many regions on your cluster and you see an 
error
-                  like that reported above in this sections title in your 
logs, see
-                  <link 
xlink:href="https://issues.apache.org/jira/browse/HBASE-4246";>HBASE-4246 
Cluster with too many regions cannot withstand some master failover 
scenarios</link>.</para>
-          </section>
-
-      </section>
-      <section xml:id="trouble.master.shutdown">
-        <title>Shutdown Errors</title>
-
-      </section>
-
-    </section>
-
-    <section xml:id="trouble.zookeeper">
-      <title>ZooKeeper</title>
-      <section xml:id="trouble.zookeeper.startup">
-        <title>Startup Errors</title>
-          <section xml:id="trouble.zookeeper.startup.address">
-             <title>Could not find my address: xyz in list of ZooKeeper quorum 
servers</title>
-             <para>A ZooKeeper server wasn't able to start, throws that error. 
xyz is the name of your server.</para>
-             <para>This is a name lookup problem. HBase tries to start a 
ZooKeeper server on some machine but that machine isn't able to find itself in 
the <varname>hbase.zookeeper.quorum</varname> configuration.
-             </para>
-             <para>Use the hostname presented in the error message instead of 
the value you used. If you have a DNS server, you can set 
<varname>hbase.zookeeper.dns.interface</varname> and 
<varname>hbase.zookeeper.dns.nameserver</varname> in 
<filename>hbase-site.xml</filename> to make sure it resolves to the correct 
FQDN.
-             </para>
-          </section>
-
-      </section>
-      <section xml:id="trouble.zookeeper.general">
-          <title>ZooKeeper, The Cluster Canary</title>
-          <para>ZooKeeper is the cluster's "canary in the mineshaft". It'll be 
the first to notice issues if any so making sure its happy is the short-cut to 
a humming cluster.
-          </para>
-          <para>
-          See the <link 
xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting";>ZooKeeper 
Operating Environment Troubleshooting</link> page. It has suggestions and tools 
for checking disk and networking performance; i.e. the operating environment 
your ZooKeeper and HBase are running in.
-          </para>
-         <para>Additionally, the utility <xref 
linkend="trouble.tools.builtin.zkcli"/> may help investigate ZooKeeper issues.
-         </para>
-      </section>
-
-    </section>
-
-    <section xml:id="trouble.ec2">
-       <title>Amazon EC2</title>
-          <section xml:id="trouble.ec2.zookeeper">
-             <title>ZooKeeper does not seem to work on Amazon EC2</title>
-             <para>HBase does not start when deployed as Amazon EC2 instances. 
 Exceptions like the below appear in the Master and/or RegionServer logs: 
</para>
-             <programlisting>
-  2009-10-19 11:52:27,030 INFO org.apache.zookeeper.ClientCnxn: Attempting
-  connection to server 
ec2-174-129-15-236.compute-1.amazonaws.com/10.244.9.171:2181
-  2009-10-19 11:52:27,032 WARN org.apache.zookeeper.ClientCnxn: Exception
-  closing session 0x0 to sun.nio.ch.SelectionKeyImpl@656dc861
-  java.net.ConnectException: Connection refused
-             </programlisting>
-             <para>
-             Security group policy is blocking the ZooKeeper port on a public 
address.
-             Use the internal EC2 host names when configuring the ZooKeeper 
quorum peer list.
-             </para>
-          </section>
-          <section xml:id="trouble.ec2.instability">
-             <title>Instability on Amazon EC2</title>
-             <para>Questions on HBase and Amazon EC2 come up frequently on the 
HBase dist-list. Search for old threads using <link 
xlink:href="http://search-hadoop.com/";>Search Hadoop</link>
-             </para>
-          </section>
-          <section xml:id="trouble.ec2.connection">
-             <title>Remote Java Connection into EC2 Cluster Not Working</title>
-             <para>
-             See Andrew's answer here, up on the user list: <link 
xlink:href="http://search-hadoop.com/m/sPdqNFAwyg2";>Remote Java client 
connection into EC2 instance</link>.
-             </para>
-          </section>
-
-    </section>
-
-    <section xml:id="trouble.versions">
-       <title>HBase and Hadoop version issues</title>
-          <section xml:id="trouble.versions.205">
-             <title><code>NoClassDefFoundError</code> when trying to run 
0.90.x on hadoop-0.20.205.x (or hadoop-1.0.x)</title>
-             <para>Apache HBase 0.90.x does not ship with hadoop-0.20.205.x, 
etc.  To make it run, you need to replace the hadoop
-             jars that Apache HBase shipped with in its 
<filename>lib</filename> directory with those of the Hadoop you want to
-             run HBase on.  If even after replacing Hadoop jars you get the 
below exception:
-<programlisting>
-sv4r6s38: Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/commons/configuration/Configuration
-sv4r6s38:       at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.&lt;init>(DefaultMetricsSystem.java:37)
-sv4r6s38:       at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.&lt;clinit>(DefaultMetricsSystem.java:34)
-sv4r6s38:       at 
org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
-sv4r6s38:       at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:209)
-sv4r6s38:       at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:177)
-sv4r6s38:       at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:229)
-sv4r6s38:       at 
org.apache.hadoop.security.KerberosName.&lt;clinit>(KerberosName.java:83)
-sv4r6s38:       at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:202)
-sv4r6s38:       at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:177)
-</programlisting>
-you need to copy under <filename>hbase/lib</filename>, the 
<filename>commons-configuration-X.jar</filename> you find
-in your Hadoop's <filename>lib</filename> directory.  That should fix the 
above complaint.
-</para>
-</section>
-
-          <section xml:id="trouble.wrong.version">
-             <title>...cannot communicate with client version...</title>
-<para>If you see something like the following in your logs
-<computeroutput>...
-2012-09-24 10:20:52,168 FATAL org.apache.hadoop.hbase.master.HMaster: 
Unhandled exception. Starting shutdown.
-org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate 
with client version 4
-...</computeroutput>
-...are you trying to talk to an Hadoop 2.0.x from an HBase that has an Hadoop 
1.0.x client?
-Use the HBase built against Hadoop 2.0 or rebuild your HBase passing the 
<command>-Dhadoop.profile=2.0</command>
-attribute to Maven (See <xref linkend="maven.build.hadoop" /> for more).
-</para>
-
-</section>
-
-</section>
-
-    <section xml:id="trouble.tests">
-       <title>Running unit or integration tests</title>
-          <section xml:id="trouble.HDFS-2556">
-             <title>Runtime exceptions from MiniDFSCluster when running 
tests</title>
-<para>If you see something like the following
-<programlisting>...
-java.lang.NullPointerException: null
-at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes
-at org.apache.hadoop.hdfs.MiniDFSCluster.&lt;init&gt;
-at org.apache.hadoop.hbase.MiniHBaseCluster.&lt;init&gt;
-at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster
-at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster
-...</programlisting>
-or
-<programlisting>...
-java.io.IOException: Shutting down
-at org.apache.hadoop.hbase.MiniHBaseCluster.init
-at org.apache.hadoop.hbase.MiniHBaseCluster.&lt;init&gt;
-at org.apache.hadoop.hbase.MiniHBaseCluster.&lt;init&gt;
-at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster
-at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster
-...</programlisting>
-... then try issuing the command <command>umask 022</command> before launching 
tests. This is a workaround for
-<link 
xlink:href="https://issues.apache.org/jira/browse/HDFS-2556";>HDFS-2556</link>
-</para>
-</section>
-
-</section>
-
-    <section xml:id="trouble.casestudy">
-      <title>Case Studies</title>
-      <para>For Performance and Troubleshooting Case Studies, see <xref 
linkend="casestudies"/>.
-      </para>
-    </section>
-
-  </chapter>

Reply via email to