docbkx from master

enis Tue, 02 Dec 2014 21:54:07 -0800

http://git-wip-us.apache.org/repos/asf/hbase/blob/48d9d27d/src/main/docbkx/performance.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index 689b26f..1757d3f 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -182,6 +182,8 @@
           save a bit of YGC churn and allocate in the old gen directly. </para>
         <para>For more information about GC logs, see <xref
             linkend="trouble.log.gc" />. </para>
+    <para>Consider also enabling the offheap Block Cache.  This has been shown 
to mitigate
+        GC pause times.  See <xref linkend="block.cache" /></para>
       </section>
     </section>
   </section>
@@ -627,7 +629,7 @@ hbase> <userinput>create 'mytable',{NAME => 'colfam1', 
BLOOMFILTER => 'ROWCOL'}<
       <title>Constants</title>
       <para>When people get started with HBase they have a tendency to write 
code that looks like
         this:</para>
-      <programlisting>
+      <programlisting language="java">
 Get get = new Get(rowkey);
 Result r = htable.get(get);
 byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns 
current version of value
@@ -635,7 +637,7 @@ byte[] b = r.getValue(Bytes.toBytes("cf"), 
Bytes.toBytes("attr"));  // returns c
       <para>But especially when inside loops (and MapReduce jobs), converting 
the columnFamily and
         column-names to byte-arrays repeatedly is surprisingly expensive. It's 
better to use
         constants for the byte-arrays, like this:</para>
-      <programlisting>
+      <programlisting language="java">
 public static final byte[] CF = "cf".getBytes();
 public static final byte[] ATTR = "attr".getBytes();
 ...
@@ -669,14 +671,14 @@ byte[] b = r.getValue(CF, ATTR);  // returns current 
version of value
       <para>There are two different approaches to pre-creating splits. The 
first approach is to rely
         on the default <code>HBaseAdmin</code> strategy (which is implemented 
in
           <code>Bytes.split</code>)... </para>
-      <programlisting>
-byte[] startKey = ...;         // your lowest keuy
+      <programlisting language="java">
+byte[] startKey = ...;         // your lowest key
 byte[] endKey = ...;                   // your highest key
 int numberOfRegions = ...;     // # of regions to create
 admin.createTable(table, startKey, endKey, numberOfRegions);
       </programlisting>
       <para>And the other approach is to define the splits yourself... </para>
-      <programlisting>
+      <programlisting language="java">
 byte[][] splits = ...;   // create your own splits
 admin.createTable(table, splits);
 </programlisting>
@@ -829,7 +831,7 @@ admin.createTable(table, splits);
           <code>Scan.HINT_LOOKAHEAD</code> can be set the on Scan object. The 
following code
         instructs the RegionServer to attempt two iterations of next before a 
seek is
         scheduled:</para>
-      <programlisting>
+      <programlisting language="java">
 Scan scan = new Scan();
 scan.addColumn(...);
 scan.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
@@ -854,7 +856,7 @@ table.getScanner(scan);
           
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html";>ResultScanners</link>
         you can cause problems on the RegionServers. Always have ResultScanner 
processing enclosed
         in try/catch blocks...</para>
-      <programlisting>
+      <programlisting language="java">
 Scan scan = new Scan();
 // set attrs...
 ResultScanner rs = htable.getScanner(scan);
@@ -878,6 +880,8 @@ htable.close();
           <methodname>setCacheBlocks</methodname> method. For input Scans to 
MapReduce jobs, this
         should be <varname>false</varname>. For frequently accessed rows, it 
is advisable to use the
         block cache.</para>
+
+    <para>Cache more data by moving your Block Cache offheap.  See <xref 
linkend="offheap.blockcache" /></para>
     </section>
     <section
       xml:id="perf.hbase.client.rowkeyonly">
@@ -984,6 +988,58 @@ htable.close();
       </section>
     </section>
     <!--  bloom  -->
+    <section>
+      <title>Hedged Reads</title>
+      <para>Hedged reads are a feature of HDFS, introduced in <link
+          
xlink:href="https://issues.apache.org/jira/browse/HDFS-5776";>HDFS-5776</link>. 
Normally, a
+        single thread is spawned for each read request. However, if hedged 
reads are enabled, the
+        client waits some configurable amount of time, and if the read does 
not return, the client
+        spawns a second read request, against a different block replica of the 
same data. Whichever
+        read returns first is used, and the other read request is discarded. 
Hedged reads can be
+        helpful for times where a rare slow read is caused by a transient 
error such as a failing
+        disk or flaky network connection.</para>
+      <para> Because a HBase RegionServer is a HDFS client, you can enable 
hedged reads in HBase, by
+        adding the following properties to the RegionServer's hbase-site.xml 
and tuning the values
+        to suit your environment.</para>
+      <itemizedlist>
+        <title>Configuration for Hedged Reads</title>
+        <listitem>
+          <para><code>dfs.client.hedged.read.threadpool.size</code> - the 
number of threads
+            dedicated to servicing hedged reads. If this is set to 0 (the 
default), hedged reads are
+            disabled.</para>
+        </listitem>
+        <listitem>
+          <para><code>dfs.client.hedged.read.threshold.millis</code> - the 
number of milliseconds to
+            wait before spawning a second read thread.</para>
+        </listitem>
+      </itemizedlist>
+      <example>
+        <title>Hedged Reads Configuration Example</title>
+        <screen><![CDATA[<property>
+  <name>dfs.client.hedged.read.threadpool.size</name>
+  <value>20</value>  <!-- 20 threads -->
+</property>
+<property>
+  <name>dfs.client.hedged.read.threshold.millis</name>
+  <value>10</value>  <!-- 10 milliseconds -->
+</property>]]></screen>
+      </example>
+      <para>Use the following metrics to tune the settings for hedged reads on
+        your cluster. See <xref linkend="hbase_metrics"/>  for more 
information.</para>
+      <itemizedlist>
+        <title>Metrics for Hedged Reads</title>
+        <listitem>
+          <para>hedgedReadOps - the number of times hedged read threads have 
been triggered. This
+            could indicate that read requests are often slow, or that hedged 
reads are triggered too
+            quickly.</para>
+        </listitem>
+        <listitem>
+          <para>hedgeReadOpsWin - the number of times the hedged read thread 
was faster than the
+            original thread. This could indicate that a given RegionServer is 
having trouble
+            servicing requests.</para>
+        </listitem>
+      </itemizedlist>
+    </section>
 
   </section>
   <!--  reading -->
@@ -1052,7 +1108,7 @@ htable.close();
           shortcircuit reads configuration page</link> for how to enable the 
latter, better version
         of shortcircuit. For example, here is a minimal config. enabling 
short-circuit reads added
         to <filename>hbase-site.xml</filename>: </para>
-      <programlisting><![CDATA[<property>
+      <programlisting language="xml"><![CDATA[<property>
   <name>dfs.client.read.shortcircuit</name>
   <value>true</value>
   <description>


http://git-wip-us.apache.org/repos/asf/hbase/blob/48d9d27d/src/main/docbkx/preface.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/preface.xml b/src/main/docbkx/preface.xml
index ff8efb9..a8f6895 100644
--- a/src/main/docbkx/preface.xml
+++ b/src/main/docbkx/preface.xml
@@ -39,15 +39,29 @@
             xlink:href="http://wiki.apache.org/hadoop/Hbase";>wiki</link> where 
the pertinent
         information can be found.</para>
 
-    <para>This reference guide is a work in progress. The source for this 
guide can be found at
-            <filename>src/main/docbkx</filename> in a checkout of the hbase 
project. This reference
-        guide is marked up using <link
-            xlink:href="http://www.docbook.com/";>DocBook</link> from which the 
the finished guide is
-        generated as part of the 'site' build target. Run <programlisting>mvn 
site</programlisting>
-        to generate this documentation. Amendments and improvements to the 
documentation are
-        welcomed. Add a patch to an issue up in the HBase <link
-            
xlink:href="https://issues.apache.org/jira/browse/HBASE";>JIRA</link>.</para>
-
+    <formalpara>
+        <title>About This Guide</title>
+        <para>This reference guide is a work in progress. The source for this 
guide can be found in
+            the <filename>src/main/docbkx</filename> directory of the HBase 
source. This reference
+            guide is marked up using <link 
xlink:href="http://www.docbook.org/";>DocBook</link> from
+            which the the finished guide is generated as part of the 'site' 
build target. Run
+            <programlisting language="bourne">mvn site</programlisting> to 
generate this documentation. Amendments and
+            improvements to the documentation are welcomed. Click <link
+                
xlink:href="https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12310753&amp;issuetype=1&amp;components=12312132&amp;summary=SHORT+DESCRIPTION";
+                >this link</link> to file a new documentation bug against 
Apache HBase with some
+            values pre-selected.</para>
+    </formalpara>
+    <formalpara>
+        <title>Contributing to the Documentation</title>
+        <para>For an overview of Docbook and suggestions to get started 
contributing to the documentation, see <xref 
linkend="appendix_contributing_to_documentation" />.</para>
+    </formalpara>
+    <formalpara>
+        <title>Providing Feedback</title>
+        <para>This guide allows you to leave comments or questions on any 
page, using Disqus. Look
+            for the Comments area at the bottom of the page. Answering these 
questions is a
+            volunteer effort, and may be delayed.</para>
+    </formalpara>
+    
     <note
         xml:id="headsup">
         <title>Heads-up if this is your first foray into the world of 
distributed

http://git-wip-us.apache.org/repos/asf/hbase/blob/48d9d27d/src/main/docbkx/schema_design.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/schema_design.xml 
b/src/main/docbkx/schema_design.xml
index 614dab7..65e64b0 100644
--- a/src/main/docbkx/schema_design.xml
+++ b/src/main/docbkx/schema_design.xml
@@ -44,7 +44,7 @@
         
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html";>HBaseAdmin</link>
       in the Java API. </para>
     <para>Tables must be disabled when making ColumnFamily modifications, for 
example:</para>
-    <programlisting>
+    <programlisting language="java">
 Configuration config = HBaseConfiguration.create();
 HBaseAdmin admin = new HBaseAdmin(conf);
 String table = "myTable";
@@ -280,7 +280,7 @@ d-foo0002
           in those eight bytes. If you stored this number as a String -- 
presuming a byte per
           character -- you need nearly 3x the bytes. </para>
         <para>Not convinced? Below is some sample code that you can run on 
your own.</para>
-        <programlisting>
+        <programlisting language="java">
 // long
 //
 long l = 1234567890L;
@@ -403,7 +403,7 @@ COLUMN                                        CELL
         are accessible in the keyspace. </para>
       <para>To conclude this example, the following is an example of how 
appropriate splits can be
         pre-created for hex-keys:. </para>
-      <programlisting><![CDATA[public static boolean createTable(HBaseAdmin 
admin, HTableDescriptor table, byte[][] splits)
+      <programlisting language="java"><![CDATA[public static boolean 
createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
 throws IOException {
   try {
     admin.createTable( table, splits );
@@ -439,18 +439,15 @@ public static byte[][] getHexSplits(String startKey, 
String endKey, int numRegio
       xml:id="schema.versions.max">
       <title>Maximum Number of Versions</title>
       <para>The maximum number of row versions to store is configured per 
column family via <link
-          
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";
-          >HColumnDescriptor</link>. The default for max versions is 3 prior 
to HBase 0.96.x, and 1
-        in newer versions. This is an important parameter because as described 
in <xref
-          linkend="datamodel"/> section HBase does <emphasis>not</emphasis> 
overwrite row values,
+          
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>.
+        The default for max versions is 1. This is an important parameter 
because as described in <xref
+          linkend="datamodel" /> section HBase does <emphasis>not</emphasis> 
overwrite row values,
         but rather stores different values per row by time (and qualifier). 
Excess versions are
         removed during major compactions. The number of max versions may need 
to be increased or
         decreased depending on application needs. </para>
       <para>It is not recommended setting the number of max versions to an 
exceedingly high level
         (e.g., hundreds or more) unless those old values are very dear to you 
because this will
         greatly increase StoreFile size. </para>
-      <para>See <xref linkend="specify.number.of.versions"/> for examples for 
setting the maximum
-        number of versions on a given column or globally.</para>
     </section>
     <section
       xml:id="schema.minversions">
@@ -465,8 +462,6 @@ public static byte[][] getHexSplits(String startKey, String 
endKey, int numRegio
           around</emphasis>" (where M is the value for minimum number of row 
versions, M&lt;N). This
         parameter should only be set when time-to-live is enabled for a column 
family and must be
         less than the number of row versions. </para>
-      <para>See <xref linkend="specify.number.of.versions"/> for examples for 
setting the minimum
-        number of versions on a given column.</para>
     </section>
   </section>
   <section
@@ -700,7 +695,7 @@ HColumnDescriptor.setKeepDeletedCells(true);
           timestamps, by performing a mod operation on the timestamp. If 
time-oriented scans are
           important, this could be a useful approach. Attention must be paid 
to the number of
           buckets, because this will require the same number of scans to 
return results.</para>
-        <programlisting>
+        <programlisting language="java">
 long bucket = timestamp % numBuckets;
         </programlisting>
         <para>â¦ to construct:</para>
@@ -1161,13 +1156,13 @@ long bucket = timestamp % numBuckets;
 ]]></programlisting>
 
       <para>The other option we had was to do this entirely using:</para>
-      <programlisting><![CDATA[
+      <programlisting language="xml"><![CDATA[
 
<FixedWidthUserName><FixedWidthPageNum0>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
 
<FixedWidthUserName><FixedWidthPageNum1>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
                ]]></programlisting>
       <para> where each row would contain multiple values. So in one case 
reading the first thirty
         values would be: </para>
-      <programlisting><![CDATA[
+      <programlisting language="java"><![CDATA[
 scan { STARTROW => 'FixedWidthUsername' LIMIT => 30}
                ]]></programlisting>
       <para>And in the second case it would be </para>

[3/9] hbase git commit: Blanket update of src/main/docbkx from master

Reply via email to