[44/51] [partial] hbase-site git commit: Published site at .

git-site-role Wed, 30 Aug 2017 08:14:40 -0700

http://git-wip-us.apache.org/repos/asf/hbase-site/blob/0d6dd914/book.html
----------------------------------------------------------------------
diff --git a/book.html b/book.html
index fbf9c1e..43c9f04 100644
--- a/book.html
+++ b/book.html
@@ -7346,7 +7346,7 @@ This abstraction lays the groundwork for upcoming 
multi-tenancy related features
 <p>Namespace Security Administration (<a 
href="https://issues.apache.org/jira/browse/HBASE-9206";>HBASE-9206</a>) - 
Provide another level of security administration for tenants.</p>
 </li>
 <li>
-<p>Region server groups (<a 
href="https://issues.apache.org/jira/browse/HBASE-6721";>HBASE-6721</a>) - A 
namespace/table can be pinned onto a subset of RegionServers thus guaranteeing 
a course level of isolation.</p>
+<p>Region server groups (<a 
href="https://issues.apache.org/jira/browse/HBASE-6721";>HBASE-6721</a>) - A 
namespace/table can be pinned onto a subset of RegionServers thus guaranteeing 
a coarse level of isolation.</p>
 </li>
 </ul>
 </div>
@@ -9697,7 +9697,7 @@ Tips:</p>
 <div class="openblock partintro">
 <div class="content">
 <div class="paragraph">
-<p>Apache MapReduce is a software framework used to analyze large amounts of 
data, and is the framework used most often with <a 
href="http://hadoop.apache.org/";>Apache Hadoop</a>.
+<p>Apache MapReduce is a software framework used to analyze large amounts of 
data. It is provided by <a href="http://hadoop.apache.org/";>Apache Hadoop</a>.
 MapReduce itself is out of the scope of this document.
 A good place to get started with MapReduce is <a 
href="http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html";
 
class="bare">http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html</a>.
 MapReduce version 2 (MR2)is now part of <a 
href="http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/";>YARN</a>.</p>
@@ -9717,12 +9717,13 @@ jobs. Finally, it discusses <a 
href="#cascading">Cascading</a>, an
 <td class="content">
 <div class="title"><code>mapred</code> and <code>mapreduce</code></div>
 <div class="paragraph">
-<p>There are two mapreduce packages in HBase as in MapReduce itself: 
<em>org.apache.hadoop.hbase.mapred</em>      and 
<em>org.apache.hadoop.hbase.mapreduce</em>.
-The former does old-style API and the latter the new style.
+<p>There are two mapreduce packages in HBase as in MapReduce itself: 
<em>org.apache.hadoop.hbase.mapred</em> and 
<em>org.apache.hadoop.hbase.mapreduce</em>.
+The former does old-style API and the latter the new mode.
 The latter has more facility though you can usually find an equivalent in the 
older package.
 Pick the package that goes with your MapReduce deploy.
-When in doubt or starting over, pick the 
<em>org.apache.hadoop.hbase.mapreduce</em>.
-In the notes below, we refer to o.a.h.h.mapreduce but replace with the 
o.a.h.h.mapred if that is what you are using.</p>
+When in doubt or starting over, pick 
<em>org.apache.hadoop.hbase.mapreduce</em>.
+In the notes below, we refer to <em>o.a.h.h.mapreduce</em> but replace with
+<em>o.a.h.h.mapred</em> if that is what you are using.</p>
 </div>
 </td>
 </tr>
@@ -9734,39 +9735,84 @@ In the notes below, we refer to o.a.h.h.mapreduce but 
replace with the o.a.h.h.m
 <h2 id="hbase.mapreduce.classpath"><a class="anchor" 
href="#hbase.mapreduce.classpath"></a>46. HBase, MapReduce, and the 
CLASSPATH</h2>
 <div class="sectionbody">
 <div class="paragraph">
-<p>By default, MapReduce jobs deployed to a MapReduce cluster do not have 
access to either the HBase configuration under <code>$HBASE_CONF_DIR</code> or 
the HBase classes.</p>
+<p>By default, MapReduce jobs deployed to a MapReduce cluster do not have 
access to
+either the HBase configuration under <code>$HBASE_CONF_DIR</code> or the HBase 
classes.</p>
 </div>
 <div class="paragraph">
-<p>To give the MapReduce jobs the access they need, you could add 
<em>hbase-site.xml</em> to <em>$HADOOP_HOME/conf</em> and add HBase jars to the 
<em>$HADOOP_HOME/lib</em> directory.
-You would then need to copy these changes across your cluster. Or you can edit 
<em>$HADOOP_HOME/conf/hadoop-env.sh</em> and add them to the 
<code>HADOOP_CLASSPATH</code> variable.
-However, this approach is not recommended because it will pollute your Hadoop 
install with HBase references.
-It also requires you to restart the Hadoop cluster before Hadoop can use the 
HBase data.</p>
+<p>To give the MapReduce jobs the access they need, you could add 
<em>hbase-site.xml_to _$HADOOP_HOME/conf</em> and add HBase jars to the 
<em>$HADOOP_HOME/lib</em> directory.
+You would then need to copy these changes across your cluster. Or you could 
edit <em>$HADOOP_HOME/conf/hadoop-env.sh</em> and add hbase dependencies to the 
<code>HADOOP_CLASSPATH</code> variable.
+Neither of these approaches is recommended because it will pollute your Hadoop 
install with HBase references.
+It also requires you restart the Hadoop cluster before Hadoop can use the 
HBase data.</p>
 </div>
 <div class="paragraph">
-<p>The recommended approach is to let HBase add its dependency jars itself and 
use <code>HADOOP_CLASSPATH</code> or <code>-libjars</code>.</p>
+<p>The recommended approach is to let HBase add its dependency jars and use 
<code>HADOOP_CLASSPATH</code> or <code>-libjars</code>.</p>
 </div>
 <div class="paragraph">
-<p>Since HBase 0.90.x, HBase adds its dependency JARs to the job configuration 
itself.
-The dependencies only need to be available on the local <code>CLASSPATH</code>.
-The following example runs the bundled HBase <a 
href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html";>RowCounter</a>
 MapReduce job against a table named <code>usertable</code>.
-If you have not set the environment variables expected in the command (the 
parts prefixed by a <code>$</code> sign and surrounded by curly braces), you 
can use the actual system paths instead.
-Be sure to use the correct version of the HBase JAR for your system.
-The backticks (<code>`</code> symbols) cause the shell to execute the 
sub-commands, setting the output of <code>hbase classpath</code> (the command 
to dump HBase CLASSPATH) to <code>HADOOP_CLASSPATH</code>.
+<p>Since HBase <code>0.90.x</code>, HBase adds its dependency JARs to the job 
configuration itself.
+The dependencies only need to be available on the local <code>CLASSPATH</code> 
and from here they&#8217;ll be picked
+up and bundled into the fat job jar deployed to the MapReduce cluster. A basic 
trick just passes
+the full hbase classpath&#8201;&#8212;&#8201;all hbase and dependent jars as 
well as configurations&#8201;&#8212;&#8201;to the mapreduce
+job runner letting hbase utility pick out from the full-on classpath what it 
needs adding them to the
+MapReduce job configuration (See the source at 
<code>TableMapReduceUtil#addDependencyJars(org.apache.hadoop.mapreduce.Job)</code>
 for how this is done).</p>
+</div>
+<div class="paragraph">
+<p>The following example runs the bundled HBase <a 
href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html";>RowCounter</a>
 MapReduce job against a table named <code>usertable</code>.
+It sets into <code>HADOOP_CLASSPATH</code> the jars hbase needs to run in an 
MapReduce context (including configuration files such as hbase-site.xml).
+Be sure to use the correct version of the HBase JAR for your system; replace 
the VERSION string in the below command line w/ the version of
+your local hbase install.  The backticks (<code>`</code> symbols) cause the 
shell to execute the sub-commands, setting the output of <code>hbase 
classpath</code> into <code>HADOOP_CLASSPATH</code>.
 This example assumes you use a BASH-compatible shell.</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="bash">$ 
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop 
jar ${HBASE_HOME}/lib/hbase-server-VERSION.jar rowcounter usertable</code></pre>
+<pre class="CodeRay highlight"><code data-lang="bash">$ 
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` \
+  ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/lib/hbase-mapreduce-VERSION.jar \
+  org.apache.hadoop.hbase.mapreduce.RowCounter usertable</code></pre>
 </div>
 </div>
 <div class="paragraph">
-<p>When the command runs, internally, the HBase JAR finds the dependencies it 
needs and adds them to the MapReduce job configuration.
-See the source at 
<code>TableMapReduceUtil#addDependencyJars(org.apache.hadoop.mapreduce.Job)</code>
 for how this is done.</p>
+<p>The above command will launch a row counting mapreduce job against the 
hbase cluster that is pointed to by your local configuration on a cluster that 
the hadoop configs are pointing to.</p>
+</div>
+<div class="paragraph">
+<p>The main for the <code>hbase-mapreduce.jar</code> is a Driver that lists a 
few basic mapreduce tasks that ship with hbase.
+For example, presuming your install is hbase <code>2.0.0-SNAPSHOT</code>:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="bash">$ 
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` \
+  ${HADOOP_HOME}/bin/hadoop jar 
${HBASE_HOME}/lib/hbase-mapreduce-2.0.0-SNAPSHOT.jar
+An example program must be given as the first argument.
+Valid program names are:
+  CellCounter: Count cells in HBase table.
+  WALPlayer: Replay WAL files.
+  completebulkload: Complete a bulk data load.
+  copytable: Export a table from local cluster to peer cluster.
+  export: Write table data to HDFS.
+  exportsnapshot: Export the specific snapshot to a given FileSystem.
+  import: Import data written by Export.
+  importtsv: Import data in TSV format.
+  rowcounter: Count rows in HBase table.
+  verifyrep: Compare the data from tables in two different clusters. WARNING: 
It doesn't work for incrementColumnValues'd cells since the timestamp is 
changed after being appended to the log.</code></pre>
+</div>
 </div>
 <div class="paragraph">
-<p>The command <code>hbase mapredcp</code> can also help you dump the 
CLASSPATH entries required by MapReduce, which are the same jars 
<code>TableMapReduceUtil#addDependencyJars</code> would add.
-You can add them together with HBase conf directory to 
<code>HADOOP_CLASSPATH</code>.
-For jobs that do not package their dependencies or call 
<code>TableMapReduceUtil#addDependencyJars</code>, the following command 
structure is necessary:</p>
+<p>You can use the above listed shortnames for mapreduce jobs as in the below 
re-run of the row counter job (again, presuming your install is hbase 
<code>2.0.0-SNAPSHOT</code>):</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="bash">$ 
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` \
+  ${HADOOP_HOME}/bin/hadoop jar 
${HBASE_HOME}/lib/hbase-mapreduce-2.0.0-SNAPSHOT.jar \
+  rowcounter usertable</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>You might find the more selective <code>hbase mapredcp</code> tool output 
of interest; it lists the minimum set of jars needed
+to run a basic mapreduce job against an hbase install. It does not include 
configuration. You&#8217;ll probably need to add
+these if you want your MapReduce job to find the target cluster. You&#8217;ll 
probably have to also add pointers to extra jars
+once you start to do anything of substance. Just specify the extras by passing 
the system propery <code>-Dtmpjars</code> when
+you run <code>hbase mapredcp</code>.</p>
+</div>
+<div class="paragraph">
+<p>For jobs that do not package their dependencies or call 
<code>TableMapReduceUtil#addDependencyJars</code>, the following command 
structure is necessary:</p>
 </div>
 <div class="listingblock">
 <div class="content">
@@ -12980,53 +13026,9 @@ See the <a href="#datamodel">Data Model</a> and the 
rest of this chapter for mor
 <p>The catalog table <code>hbase:meta</code> exists as an HBase table and is 
filtered out of the HBase shell&#8217;s <code>list</code> command, but is in 
fact a table just like any other.</p>
 </div>
 <div class="sect2">
-<h3 id="arch.catalog.root"><a class="anchor" 
href="#arch.catalog.root"></a>65.1. -ROOT-</h3>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-The <code>-ROOT-</code> table was removed in HBase 0.96.0.
-Information here should be considered historical.
-</td>
-</tr>
-</table>
-</div>
-<div class="paragraph">
-<p>The <code>-ROOT-</code> table kept track of the location of the 
<code>.META</code> table (the previous name for the table now called 
<code>hbase:meta</code>) prior to HBase 0.96.
-The <code>-ROOT-</code> table structure was as follows:</p>
-</div>
-<div class="ulist">
-<div class="title">Key</div>
-<ul>
-<li>
-<p>.META.
-region key (<code>.META.,,1</code>)</p>
-</li>
-</ul>
-</div>
-<div class="ulist">
-<div class="title">Values</div>
-<ul>
-<li>
-<p><code>info:regioninfo</code> (serialized <a 
href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html";>HRegionInfo</a>
 instance of <code>hbase:meta</code>)</p>
-</li>
-<li>
-<p><code>info:server</code> (server:port of the RegionServer holding 
<code>hbase:meta</code>)</p>
-</li>
-<li>
-<p><code>info:serverstartcode</code> (start-time of the RegionServer process 
holding <code>hbase:meta</code>)</p>
-</li>
-</ul>
-</div>
-</div>
-<div class="sect2">
-<h3 id="arch.catalog.meta"><a class="anchor" 
href="#arch.catalog.meta"></a>65.2. hbase:meta</h3>
+<h3 id="arch.catalog.meta"><a class="anchor" 
href="#arch.catalog.meta"></a>65.1. hbase:meta</h3>
 <div class="paragraph">
-<p>The <code>hbase:meta</code> table (previously called <code>.META.</code>) 
keeps a list of all regions in the system.
-The location of <code>hbase:meta</code> was previously tracked within the 
<code>-ROOT-</code> table, but is now stored in ZooKeeper.</p>
+<p>The <code>hbase:meta</code> table (previously called <code>.META.</code>) 
keeps a list of all regions in the system and is stored in ZooKeeper.</p>
 </div>
 <div class="paragraph">
 <p>The <code>hbase:meta</code> table structure is as follows:</p>
@@ -13084,7 +13086,7 @@ utility.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="arch.catalog.startup"><a class="anchor" 
href="#arch.catalog.startup"></a>65.3. Startup Sequencing</h3>
+<h3 id="arch.catalog.startup"><a class="anchor" 
href="#arch.catalog.startup"></a>65.2. Startup Sequencing</h3>
 <div class="paragraph">
 <p>First, the location of <code>hbase:meta</code> is looked up in ZooKeeper.
 Next, <code>hbase:meta</code> is updated with server and startcode values.</p>
@@ -13839,10 +13841,24 @@ Here are others that you may have to take into 
account:</p>
 <dl>
 <dt class="hdlist1">Catalog Tables</dt>
 <dd>
-<p>The <code>-ROOT-</code> (prior to HBase 0.96, see <a 
href="#arch.catalog.root">arch.catalog.root</a>) and <code>hbase:meta</code> 
tables are forced into the block cache and have the in-memory priority which 
means that they are harder to evict.
-The former never uses more than a few hundred bytes while the latter can 
occupy a few MBs
-(depending on the number of regions).</p>
+<p>The <code>hbase:meta</code> table is forced into the block cache and have 
the in-memory priority which means that they are harder to evict.</p>
 </dd>
+</dl>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+The hbase:meta tables can occupy a few MBs depending on the number of regions.
+</td>
+</tr>
+</table>
+</div>
+<div class="dlist">
+<dl>
 <dt class="hdlist1">HFiles Indexes</dt>
 <dd>
 <p>An <em>HFile</em> is the file format that HBase uses to store data in HDFS.
@@ -35070,7 +35086,7 @@ The server will return cellblocks compressed using this 
same compressor as long
 <div id="footer">
 <div id="footer-text">
 Version 3.0.0-SNAPSHOT<br>
-Last updated 2017-08-29 14:29:39 UTC
+Last updated 2017-08-30 14:29:43 UTC
 </div>
 </div>
 </body>


http://git-wip-us.apache.org/repos/asf/hbase-site/blob/0d6dd914/bulk-loads.html
----------------------------------------------------------------------
diff --git a/bulk-loads.html b/bulk-loads.html
index fd31c39..5ed792a 100644
--- a/bulk-loads.html
+++ b/bulk-loads.html
@@ -7,7 +7,7 @@
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20170829" />
+    <meta name="Date-Revision-yyyymmdd" content="20170830" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache HBase &#x2013;  
       Bulk Loads in Apache HBase (TM)
@@ -311,7 +311,7 @@ under the License. -->
                         <a href="https://www.apache.org/";>The Apache Software 
Foundation</a>.
             All rights reserved.      
                     
-                  <li id="publishDate" class="pull-right">Last Published: 
2017-08-29</li>
+                  <li id="publishDate" class="pull-right">Last Published: 
2017-08-30</li>
             </p>
                 </div>

[44/51] [partial] hbase-site git commit: Published site at .

Reply via email to