[27/51] [partial] hbase-site git commit: Published site at 82d554e3783372cc6b05489452c815b57c06f6cd.

git-site-role Sat, 08 Jul 2017 08:02:21 -0700

http://git-wip-us.apache.org/repos/asf/hbase-site/blob/9fb0764b/book.html
----------------------------------------------------------------------
diff --git a/book.html b/book.html
index 9eeedb5..3cf6127 100644
--- a/book.html
+++ b/book.html
@@ -1381,11 +1381,10 @@ To check for well-formedness and only print output if 
errors exist, use the comm
 <td class="content">
 <div class="title">Keep Configuration In Sync Across the Cluster</div>
 <div class="paragraph">
-<p>When running in distributed mode, after you make an edit to an HBase 
configuration, make sure you copy the content of the <em>conf/</em> directory 
to all nodes of the cluster.
+<p>When running in distributed mode, after you make an edit to an HBase 
configuration, make sure you copy the contents of the <em>conf/</em> directory 
to all nodes of the cluster.
 HBase will not do this for you.
 Use <code>rsync</code>, <code>scp</code>, or another secure mechanism for 
copying the configuration files to your nodes.
-For most configuration, a restart is needed for servers to pick up changes An 
exception is dynamic configuration.
-to be described later below.</p>
+For most configurations, a restart is needed for servers to pick up changes. 
Dynamic configuration is an exception to this, to be described later below.</p>
 </div>
 </td>
 </tr>
@@ -1473,12 +1472,12 @@ You must set <code>JAVA_HOME</code> on each node of 
your cluster. <em>hbase-env.
 </dd>
 <dt class="hdlist1">Loopback IP</dt>
 <dd>
-<p>Prior to hbase-0.96.0, HBase only used the IP address 
<code>127.0.0.1</code> to refer to <code>localhost</code>, and this could not 
be configured.
+<p>Prior to hbase-0.96.0, HBase only used the IP address 
<code>127.0.0.1</code> to refer to <code>localhost</code>, and this was not 
configurable.
 See <a href="#loopback.ip">Loopback IP</a> for more details.</p>
 </dd>
 <dt class="hdlist1">NTP</dt>
 <dd>
-<p>The clocks on cluster nodes should be synchronized. A small amount of 
variation is acceptable, but larger amounts of skew can cause erratic and 
unexpected behavior. Time synchronization is one of the first things to check 
if you see unexplained problems in your cluster. It is recommended that you run 
a Network Time Protocol (NTP) service, or another time-synchronization 
mechanism, on your cluster, and that all nodes look to the same service for 
time synchronization. See the <a 
href="http://www.tldp.org/LDP/sag/html/basic-ntp-config.html";>Basic NTP 
Configuration</a> at <em class="citetitle">The Linux Documentation Project 
(TLDP)</em> to set up NTP.</p>
+<p>The clocks on cluster nodes should be synchronized. A small amount of 
variation is acceptable, but larger amounts of skew can cause erratic and 
unexpected behavior. Time synchronization is one of the first things to check 
if you see unexplained problems in your cluster. It is recommended that you run 
a Network Time Protocol (NTP) service, or another time-synchronization 
mechanism on your cluster and that all nodes look to the same service for time 
synchronization. See the <a 
href="http://www.tldp.org/LDP/sag/html/basic-ntp-config.html";>Basic NTP 
Configuration</a> at <em class="citetitle">The Linux Documentation Project 
(TLDP)</em> to set up NTP.</p>
 </dd>
 </dl>
 </div>
@@ -1540,8 +1539,8 @@ hadoop  -       nproc   32000</pre>
 </dd>
 <dt class="hdlist1">Windows</dt>
 <dd>
-<p>Prior to HBase 0.96, testing for running HBase on Microsoft Windows was 
limited.
-Running a on Windows nodes is not recommended for production systems.</p>
+<p>Prior to HBase 0.96, running HBase on Microsoft Windows was limited only 
for testing purposes.
+Running production systems on Windows machines is not recommended.</p>
 </dd>
 </dl>
 </div>
@@ -1774,8 +1773,8 @@ data loss. This patch is present in Apache Hadoop 
releases 2.6.1+.</p>
 The bundled jar is ONLY for use in standalone mode.
 In distributed mode, it is <em>critical</em> that the version of Hadoop that 
is out on your cluster match what is under HBase.
 Replace the hadoop jar found in the HBase lib directory with the hadoop jar 
you are running on your cluster to avoid version mismatch issues.
-Make sure you replace the jar in HBase everywhere on your cluster.
-Hadoop version mismatch issues have various manifestations but often all looks 
like its hung up.</p>
+Make sure you replace the jar in HBase across your whole cluster.
+Hadoop version mismatch issues have various manifestations but often all look 
like its hung.</p>
 </div>
 </td>
 </tr>
@@ -1860,7 +1859,7 @@ HDFS where data is replicated ensures the latter.</p>
 </div>
 <div class="paragraph">
 <p>To configure this standalone variant, edit your <em>hbase-site.xml</em>
-setting the <em>hbase.rootdir</em> to point at a directory in your
+setting <em>hbase.rootdir</em>  to point at a directory in your
 HDFS instance but then set <em>hbase.cluster.distributed</em>
 to <em>false</em>. For example:</p>
 </div>
@@ -1912,8 +1911,8 @@ Some of the information that was originally in this 
section has been moved there
 </div>
 <div class="paragraph">
 <p>A pseudo-distributed mode is simply a fully-distributed mode run on a 
single host.
-Use this configuration testing and prototyping on HBase.
-Do not use this configuration for production nor for evaluating HBase 
performance.</p>
+Use this HBase configuration for testing and prototyping purposes only.
+Do not use this configuration for production or for performance evaluation.</p>
 </div>
 </div>
 </div>
@@ -1922,11 +1921,11 @@ Do not use this configuration for production nor for 
evaluating HBase performanc
 <div class="paragraph">
 <p>By default, HBase runs in standalone mode.
 Both standalone mode and pseudo-distributed mode are provided for the purposes 
of small-scale testing.
-For a production environment, distributed mode is appropriate.
+For a production environment, distributed mode is advised.
 In distributed mode, multiple instances of HBase daemons run on multiple 
servers in the cluster.</p>
 </div>
 <div class="paragraph">
-<p>Just as in pseudo-distributed mode, a fully distributed configuration 
requires that you set the <code>hbase-cluster.distributed</code> property to 
<code>true</code>.
+<p>Just as in pseudo-distributed mode, a fully distributed configuration 
requires that you set the <code>hbase.cluster.distributed</code> property to 
<code>true</code>.
 Typically, the <code>hbase.rootdir</code> is configured to point to a 
highly-available HDFS filesystem.</p>
 </div>
 <div class="paragraph">
@@ -2088,7 +2087,7 @@ For the list of configurable properties, see <a 
href="#hbase_default_configurati
 </div>
 <div class="paragraph">
 <p>Not all configuration options make it out to <em>hbase-default.xml</em>.
-Configuration that it is thought rare anyone would change can exist only in 
code; the only way to turn up such configurations is via a reading of the 
source code itself.</p>
+Some configurations would only appear in source code; the only way to identify 
these changes are through code review.</p>
 </div>
 <div class="paragraph">
 <p>Currently, changes here will require a cluster restart for HBase to notice 
the change.</p>
@@ -5113,13 +5112,13 @@ Add your own environment variables here if you want 
them read by HBase daemons o
 <p>Since the HBase Master may move around, clients bootstrap by looking to 
ZooKeeper for current critical locations.
 ZooKeeper is where all these values are kept.
 Thus clients require the location of the ZooKeeper ensemble before they can do 
anything else.
-Usually this the ensemble location is kept out in the <em>hbase-site.xml</em> 
and is picked up by the client from the <code>CLASSPATH</code>.</p>
+Usually this ensemble location is kept out in the <em>hbase-site.xml</em> and 
is picked up by the client from the <code>CLASSPATH</code>.</p>
 </div>
 <div class="paragraph">
 <p>If you are configuring an IDE to run an HBase client, you should include 
the <em>conf/</em> directory on your classpath so <em>hbase-site.xml</em> 
settings can be found (or add <em>src/test/resources</em> to pick up the 
hbase-site.xml used by tests).</p>
 </div>
 <div class="paragraph">
-<p>Minimally, a client of HBase needs several libraries in its 
<code>CLASSPATH</code> when connecting to a cluster, including:</p>
+<p>Minimally, an HBase client needs several libraries in its 
<code>CLASSPATH</code> when connecting to a cluster, including:</p>
 </div>
 <div class="listingblock">
 <div class="content">
@@ -5135,7 +5134,7 @@ zookeeper (zookeeper-<span class="float">3.4</span><span 
class="float">.2</span>
 </div>
 </div>
 <div class="paragraph">
-<p>An example basic <em>hbase-site.xml</em> for client only might look as 
follows:</p>
+<p>A basic example <em>hbase-site.xml</em> for client only may look as 
follows:</p>
 </div>
 <div class="listingblock">
 <div class="content">
@@ -5179,7 +5178,7 @@ config.set(<span class="string"><span 
class="delimiter">&quot;</span><span class
 <div class="sect2">
 <h3 id="_basic_distributed_hbase_install"><a class="anchor" 
href="#_basic_distributed_hbase_install"></a>8.1. Basic Distributed HBase 
Install</h3>
 <div class="paragraph">
-<p>Here is an example basic configuration for a distributed ten node cluster:
+<p>Here is a basic configuration example for a distributed ten node cluster:
 * The nodes are named <code>example0</code>, <code>example1</code>, etc., 
through node <code>example9</code> in this example.
 * The HBase Master and the HDFS NameNode are running on the node 
<code>example0</code>.
 * RegionServers run on nodes <code>example1</code>-<code>example9</code>.
@@ -5299,11 +5298,11 @@ See <a 
href="https://issues.apache.org/jira/browse/HBASE-6389";>HBASE-6389 Modify
 <h5 id="sect.zookeeper.session.timeout"><a class="anchor" 
href="#sect.zookeeper.session.timeout"></a><code>zookeeper.session.timeout</code></h5>
 <div class="paragraph">
 <p>The default timeout is three minutes (specified in milliseconds). This 
means that if a server crashes, it will be three minutes before the Master 
notices the crash and starts recovery.
-You might like to tune the timeout down to a minute or even less so the Master 
notices failures the sooner.
-Before changing this value, be sure you have your JVM garbage collection 
configuration under control otherwise, a long garbage collection that lasts 
beyond the ZooKeeper session timeout will take out your RegionServer (You might 
be fine with this&#8201;&#8212;&#8201;you probably want recovery to start on 
the server if a RegionServer has been in GC for a long period of time).</p>
+You might need to tune the timeout down to a minute or even less so the Master 
notices failures sooner.
+Before changing this value, be sure you have your JVM garbage collection 
configuration under control, otherwise, a long garbage collection that lasts 
beyond the ZooKeeper session timeout will take out your RegionServer. (You 
might be fine with this&#8201;&#8212;&#8201;you probably want recovery to start 
on the server if a RegionServer has been in GC for a long period of time).</p>
 </div>
 <div class="paragraph">
-<p>To change this configuration, edit <em>hbase-site.xml</em>, copy the 
changed file around the cluster and restart.</p>
+<p>To change this configuration, edit <em>hbase-site.xml</em>, copy the 
changed file across the cluster and restart.</p>
 </div>
 <div class="paragraph">
 <p>We set this value high to save our having to field questions up on the 
mailing lists asking why a RegionServer went down during a massive import.
@@ -5322,16 +5321,15 @@ Later when they&#8217;ve built some confidence, then 
they can play with configur
 <div class="sect3">
 <h4 id="recommended.configurations.hdfs"><a class="anchor" 
href="#recommended.configurations.hdfs"></a>9.2.2. HDFS Configurations</h4>
 <div class="sect4">
-<h5 id="dfs.datanode.failed.volumes.tolerated"><a class="anchor" 
href="#dfs.datanode.failed.volumes.tolerated"></a>dfs.datanode.failed.volumes.tolerated</h5>
+<h5 id="dfs.datanode.failed.volumes.tolerated"><a class="anchor" 
href="#dfs.datanode.failed.volumes.tolerated"></a><code>dfs.datanode.failed.volumes.tolerated</code></h5>
 <div class="paragraph">
 <p>This is the "&#8230;&#8203;number of volumes that are allowed to fail 
before a DataNode stops offering service.
 By default any volume failure will cause a datanode to shutdown" from the 
<em>hdfs-default.xml</em> description.
 You might want to set this to about half the amount of your available 
disks.</p>
 </div>
 </div>
-</div>
-<div class="sect3">
-<h4 id="hbase.regionserver.handler.count_description"><a class="anchor" 
href="#hbase.regionserver.handler.count_description"></a>9.2.3. 
<code>hbase.regionserver.handler.count</code></h4>
+<div class="sect4">
+<h5 id="hbase.regionserver.handler.count"><a class="anchor" 
href="#hbase.regionserver.handler.count"></a><code>hbase.regionserver.handler.count</code></h5>
 <div class="paragraph">
 <p>This setting defines the number of threads that are kept open to answer 
incoming requests to user tables.
 The rule of thumb is to keep this number low when the payload per request 
approaches the MB (big puts, scans using a large cache) and high when the 
payload is small (gets, small puts, ICVs, deletes). The total size of the 
queries in progress is limited by the setting 
<code>hbase.ipc.server.max.callqueue.size</code>.</p>
@@ -5347,16 +5345,17 @@ A RegionServer running on low memory will trigger its 
JVM&#8217;s garbage collec
 <p>You can get a sense of whether you have too little or too many handlers by 
<a href="#rpc.logging">rpc.logging</a> on an individual RegionServer then 
tailing its logs (Queued requests consume memory).</p>
 </div>
 </div>
+</div>
 <div class="sect3">
-<h4 id="big_memory"><a class="anchor" href="#big_memory"></a>9.2.4. 
Configuration for large memory machines</h4>
+<h4 id="big_memory"><a class="anchor" href="#big_memory"></a>9.2.3. 
Configuration for large memory machines</h4>
 <div class="paragraph">
 <p>HBase ships with a reasonable, conservative configuration that will work on 
nearly all machine types that people might want to test with.
-If you have larger machines&#8201;&#8212;&#8201;HBase has 8G and larger 
heap&#8201;&#8212;&#8201;you might the following configuration options helpful.
+If you have larger machines&#8201;&#8212;&#8201;HBase has 8G and larger 
heap&#8201;&#8212;&#8201;you might find the following configuration options 
helpful.
 TODO.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="config.compression"><a class="anchor" 
href="#config.compression"></a>9.2.5. Compression</h4>
+<h4 id="config.compression"><a class="anchor" 
href="#config.compression"></a>9.2.4. Compression</h4>
 <div class="paragraph">
 <p>You should consider enabling ColumnFamily compression.
 There are several options that are near-frictionless and in most all cases 
boost performance by reducing the size of StoreFiles and thus reducing I/O.</p>
@@ -5366,7 +5365,7 @@ There are several options that are near-frictionless and 
in most all cases boost
 </div>
 </div>
 <div class="sect3">
-<h4 id="config.wals"><a class="anchor" href="#config.wals"></a>9.2.6. 
Configuring the size and number of WAL files</h4>
+<h4 id="config.wals"><a class="anchor" href="#config.wals"></a>9.2.5. 
Configuring the size and number of WAL files</h4>
 <div class="paragraph">
 <p>HBase uses <a href="#wal">wal</a> to recover the memstore data that has not 
been flushed to disk in case of an RS failure.
 These WAL files should be configured to be slightly smaller than HDFS block 
(by default a HDFS block is 64Mb and a WAL file is ~60Mb).</p>
@@ -5379,12 +5378,12 @@ However, as all memstores are not expected to be full 
all the time, less WAL fil
 </div>
 </div>
 <div class="sect3">
-<h4 id="disable.splitting"><a class="anchor" 
href="#disable.splitting"></a>9.2.7. Managed Splitting</h4>
+<h4 id="disable.splitting"><a class="anchor" 
href="#disable.splitting"></a>9.2.6. Managed Splitting</h4>
 <div class="paragraph">
-<p>HBase generally handles splitting your regions, based upon the settings in 
your <em>hbase-default.xml</em> and <em>hbase-site.xml</em>          
configuration files.
+<p>HBase generally handles splitting of your regions based upon the settings 
in your <em>hbase-default.xml</em> and <em>hbase-site.xml</em>          
configuration files.
 Important settings include 
<code>hbase.regionserver.region.split.policy</code>, 
<code>hbase.hregion.max.filesize</code>, 
<code>hbase.regionserver.regionSplitLimit</code>.
 A simplistic view of splitting is that when a region grows to 
<code>hbase.hregion.max.filesize</code>, it is split.
-For most use patterns, most of the time, you should use automatic splitting.
+For most usage patterns, you should use automatic splitting.
 See <a href="#manual_region_splitting_decisions">manual region splitting 
decisions</a> for more information about manual region splitting.</p>
 </div>
 <div class="paragraph">
@@ -5422,8 +5421,8 @@ It is better to err on the side of too few regions and 
perform rolling splits la
 The optimal number of regions depends upon the largest StoreFile in your 
region.
 The size of the largest StoreFile will increase with time if the amount of 
data grows.
 The goal is for the largest region to be just large enough that the compaction 
selection algorithm only compacts it during a timed major compaction.
-Otherwise, the cluster can be prone to compaction storms where a large number 
of regions under compaction at the same time.
-It is important to understand that the data growth causes compaction storms, 
and not the manual split decision.</p>
+Otherwise, the cluster can be prone to compaction storms with a large number 
of regions under compaction at the same time.
+It is important to understand that the data growth causes compaction storms 
and not the manual split decision.</p>
 </div>
 <div class="paragraph">
 <p>If the regions are split into too many large regions, you can increase the 
major compaction interval by configuring 
<code>HConstants.MAJOR_COMPACTION_PERIOD</code>.
@@ -5431,7 +5430,7 @@ HBase 0.90 introduced 
<code>org.apache.hadoop.hbase.util.RegionSplitter</code>,
 </div>
 </div>
 <div class="sect3">
-<h4 id="managed.compactions"><a class="anchor" 
href="#managed.compactions"></a>9.2.8. Managed Compactions</h4>
+<h4 id="managed.compactions"><a class="anchor" 
href="#managed.compactions"></a>9.2.7. Managed Compactions</h4>
 <div class="paragraph">
 <p>By default, major compactions are scheduled to run once in a 7-day period.
 Prior to HBase 0.96.x, major compactions were scheduled to happen once per day 
by default.</p>
@@ -5462,7 +5461,7 @@ You can run major compactions manually via the HBase 
shell or via the <a href="h
 </div>
 </div>
 <div class="sect3">
-<h4 id="spec.ex"><a class="anchor" href="#spec.ex"></a>9.2.9. Speculative 
Execution</h4>
+<h4 id="spec.ex"><a class="anchor" href="#spec.ex"></a>9.2.8. Speculative 
Execution</h4>
 <div class="paragraph">
 <p>Speculative Execution of MapReduce tasks is on by default, and for HBase 
clusters it is generally advised to turn off Speculative Execution at a 
system-level unless you need it for a specific case, where it can be configured 
per-job.
 Set the properties <code>mapreduce.map.speculative</code> and 
<code>mapreduce.reduce.speculative</code> to false.</p>
@@ -5503,9 +5502,9 @@ You might also see the graphs on the tail of <a 
href="https://issues.apache.org/
 See the Deveraj Das and Nicolas Liochon blog post <a 
href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/";>Introduction
 to HBase Mean Time to Recover (MTTR)</a> for a brief introduction.</p>
 </div>
 <div class="paragraph">
-<p>The issue <a 
href="https://issues.apache.org/jira/browse/HBASE-8389";>HBASE-8354 forces 
Namenode into loop with lease recovery requests</a> is messy but has a bunch of 
good discussion toward the end on low timeouts and how to effect faster 
recovery including citation of fixes added to HDFS. Read the Varun Sharma 
comments.
+<p>The issue <a 
href="https://issues.apache.org/jira/browse/HBASE-8389";>HBASE-8354 forces 
Namenode into loop with lease recovery requests</a> is messy but has a bunch of 
good discussion toward the end on low timeouts and how to cause faster recovery 
including citation of fixes added to HDFS. Read the Varun Sharma comments.
 The below suggested configurations are Varun&#8217;s suggestions distilled and 
tested.
-Make sure you are running on a late-version HDFS so you have the fixes he 
refers too and himself adds to HDFS that help HBase MTTR (e.g.
+Make sure you are running on a late-version HDFS so you have the fixes he 
refers to and himself adds to HDFS that help HBase MTTR (e.g.
 HDFS-3703, HDFS-3712, and HDFS-4791&#8201;&#8212;&#8201;Hadoop 2 for sure has 
them and late Hadoop 1 has some). Set the following in the RegionServer.</p>
 </div>
 <div class="listingblock">
@@ -20880,7 +20879,7 @@ See <a href="#block.cache">Block Cache</a></p>
 <div class="sect2">
 <h3 id="perf.handlers"><a class="anchor" href="#perf.handlers"></a>96.3. 
<code>hbase.regionserver.handler.count</code></h3>
 <div class="paragraph">
-<p>See <a 
href="#hbase.regionserver.handler.count">[hbase.regionserver.handler.count]</a>.</p>
+<p>See <a 
href="#hbase.regionserver.handler.count"><code>hbase.regionserver.handler.count</code></a>.</p>
 </div>
 </div>
 <div class="sect2">


http://git-wip-us.apache.org/repos/asf/hbase-site/blob/9fb0764b/bulk-loads.html
----------------------------------------------------------------------
diff --git a/bulk-loads.html b/bulk-loads.html
index 298f611..f619129 100644
--- a/bulk-loads.html
+++ b/bulk-loads.html
@@ -7,7 +7,7 @@
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20170707" />
+    <meta name="Date-Revision-yyyymmdd" content="20170708" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache HBase &#x2013;  
       Bulk Loads in Apache HBase (TM)
@@ -311,7 +311,7 @@ under the License. -->
                         <a href="https://www.apache.org/";>The Apache Software 
Foundation</a>.
             All rights reserved.      
                     
-                  <li id="publishDate" class="pull-right">Last Published: 
2017-07-07</li>
+                  <li id="publishDate" class="pull-right">Last Published: 
2017-07-08</li>
             </p>
                 </div>

[27/51] [partial] hbase-site git commit: Published site at 82d554e3783372cc6b05489452c815b57c06f6cd.

Reply via email to