Added: hbase/hbase.apache.org/trunk/book/apas09.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/apas09.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/apas09.html (added)
+++ hbase/hbase.apache.org/trunk/book/apas09.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,82 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>A.9.&nbsp;Docbook Common Issues</title><link rel="stylesheet" 
type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="appendix_contributing_to_documentation.html" 
title="Appendix&nbsp;A.&nbsp;Contributing to Documentation"><link rel="up" 
href="appendix_contributing_to_documentation.html" 
title="Appendix&nbsp;A.&nbsp;Contributing to Documentation"><link rel="prev" 
href="apas08.html" title="A.8.&nbsp;Adding a New Chapter to the HBase Reference 
Guide"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" 
alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation 
header"><tr><th colspan="3" align="center">A.9.&nbsp;Docbook Common 
Issues</th></tr><tr><td width="20%" align="left"><a accesskey="p" 
href="apas08.html">Prev</a>&nbsp;</td><th width="60%" 
align="center">&nbsp;</th><td width="20%" align="right">&nbsp;</td></tr></table
 ><hr></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="section"><div class="titlepage"><div><div><h2 
class="title" style="clear: both"><a name="d9580e239"></a>A.9.&nbsp;Docbook 
Common Issues</h2></div></div></div><p>The following Docbook issues come up 
often. Some of these are preferences, but others
+            can create mysterious build errors or other problems.</p><div 
class="qandaset"><a name="d9580e244"></a><dl><dt>A.9.1. <a 
href="apas09.html#d9580e245">What can go where?</a></dt><dt>A.9.2. <a 
href="apas09.html#d9580e255">Paragraphs and Admonitions</a></dt><dt>A.9.3. <a 
href="apas09.html#d9580e264">Wrap textual &lt;listitem&gt; and &lt;entry&gt; 
contents in &lt;para&gt;
+                        elements.</a></dt><dt>A.9.4. <a 
href="apas09.html#d9580e273">When to use &lt;command&gt;, &lt;code&gt;, 
&lt;programlisting&gt;,
+                        &lt;screen&gt;</a></dt><dt>A.9.5. <a 
href="apas09.html#d9580e290">How to escape XML elements so that they show up as 
XML</a></dt><dt>A.9.6. <a href="apas09.html#d9580e297">Tips and tricks for 
making screen output look good</a></dt><dt>A.9.7. <a 
href="apas09.html#d9580e316">Isolate Changes for Easy Diff 
Review.</a></dt><dt>A.9.8. <a href="apas09.html#d9580e323">Syntax 
Highlighting</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col 
align="left" width="1%"><col></colgroup><tbody><tr class="question"><td 
align="left" valign="top"><a name="d9580e245"></a><a 
name="d9580e246"></a><p><b>A.9.1.</b></p></td><td align="left" 
valign="top"><p>What can go where?</p></td></tr><tr class="answer"><td 
align="left" valign="top"></td><td align="left" valign="top"><p>There is often 
confusion about which child elements are valid in a given
+                        context. When in doubt, <a class="link" 
href="http://docbook.org/tdg/en/html/docbook.html"; target="_top">Docbook: The
+                            Definitive Guide</a> is the best resource. It has 
an appendix which
+                        is indexed by element and contains all valid child and 
parent elements of
+                        any given element. If you edit Docbook often, a 
schema-aware XML editor
+                        makes things easier.</p></td></tr><tr 
class="question"><td align="left" valign="top"><a name="d9580e255"></a><a 
name="d9580e256"></a><p><b>A.9.2.</b></p></td><td align="left" 
valign="top"><p>Paragraphs and Admonitions</p></td></tr><tr class="answer"><td 
align="left" valign="top"></td><td align="left" valign="top"><p>It is a common 
pattern, and it is technically valid, to put an admonition
+                        such as a &lt;note&gt; inside a &lt;para&gt; element. 
Because admonitions
+                        render as block-level elements (they take the whole 
width of the page), it
+                        is better to mark them up as siblings to the 
paragraphs around them, like
+                        this:</p><pre class="programlisting"><strong 
class="hl-tag" style="color: #000096">&lt;para&gt;</strong>This is the 
paragraph.<strong class="hl-tag" style="color: #000096">&lt;/para&gt;</strong>
+<strong class="hl-tag" style="color: #000096">&lt;note&gt;</strong>
+    <strong class="hl-tag" style="color: #000096">&lt;para&gt;</strong>This is 
an admonition which occurs after the paragraph.<strong class="hl-tag" 
style="color: #000096">&lt;/para&gt;</strong>
+<strong class="hl-tag" style="color: 
#000096">&lt;/note&gt;</strong></pre></td></tr><tr class="question"><td 
align="left" valign="top"><a name="d9580e264"></a><a 
name="d9580e265"></a><p><b>A.9.3.</b></p></td><td align="left" 
valign="top"><p>Wrap textual &lt;listitem&gt; and &lt;entry&gt; contents in 
&lt;para&gt;
+                        elements.</p></td></tr><tr class="answer"><td 
align="left" valign="top"></td><td align="left" valign="top"><p>Because the 
contents of a &lt;listitem&gt; (an element in an itemized,
+                        ordered, or variable list) or an &lt;entry&gt; (a cell 
in a table) can
+                        consist of things other than plain text, they need to 
be wrapped in some
+                        element. If they are plain text, they need to be 
inclosed in &lt;para&gt;
+                        tags. This is tedious but necessary for 
validity.</p><pre class="programlisting"><strong class="hl-tag" style="color: 
#000096">&lt;itemizedlist&gt;</strong>
+    <strong class="hl-tag" style="color: #000096">&lt;listitem&gt;</strong>
+        <strong class="hl-tag" style="color: 
#000096">&lt;para&gt;</strong>This is a paragraph.<strong class="hl-tag" 
style="color: #000096">&lt;/para&gt;</strong>
+    <strong class="hl-tag" style="color: #000096">&lt;/listitem&gt;</strong>
+    <strong class="hl-tag" style="color: #000096">&lt;listitem&gt;</strong>
+        <strong class="hl-tag" style="color: 
#000096">&lt;screen&gt;</strong>This is screen output.<strong class="hl-tag" 
style="color: #000096">&lt;/screen&gt;</strong>
+    <strong class="hl-tag" style="color: #000096">&lt;/listitem&gt;</strong>
+<strong class="hl-tag" style="color: 
#000096">&lt;/itemizedlist&gt;</strong></pre></td></tr><tr class="question"><td 
align="left" valign="top"><a name="d9580e273"></a><a 
name="d9580e274"></a><p><b>A.9.4.</b></p></td><td align="left" 
valign="top"><p>When to use &lt;command&gt;, &lt;code&gt;, 
&lt;programlisting&gt;,
+                        &lt;screen&gt;</p></td></tr><tr class="answer"><td 
align="left" valign="top"></td><td align="left" valign="top"><p>The first two 
are in-line tags, which can occur within the flow of
+                        paragraphs or titles. The second two are block 
elements.</p><p>Use &lt;command&gt; to mention a command such as <span 
class="command"><strong>hbase
+                            shell</strong></span> in the flow of a sentence. 
Use &lt;code&gt; for other
+                        inline text referring to code. Incidentally, use 
&lt;literal&gt; to specify
+                        literal strings that should be typed or entered 
exactly as shown. Within a
+                        &lt;screen&gt; listing, it can be helpful to use the 
&lt;userinput&gt; and
+                        &lt;computeroutput&gt; elements to mark up the text 
further.</p><p>Use &lt;screen&gt; to display input and output as the user would
+                            <span class="emphasis"><em>see</em></span> it on 
the screen, in a log file, etc. Use
+                        &lt;programlisting&gt; only for blocks of code that 
occur within a file,
+                        such as Java or XML code, or a Bash shell 
script.</p></td></tr><tr class="question"><td align="left" valign="top"><a 
name="d9580e290"></a><a name="d9580e291"></a><p><b>A.9.5.</b></p></td><td 
align="left" valign="top"><p>How to escape XML elements so that they show up as 
XML</p></td></tr><tr class="answer"><td align="left" valign="top"></td><td 
align="left" valign="top"><p>For one-off instances or short in-line mentions, 
use the &amp;lt; and
+                        &amp;gt; encoded characters. For longer mentions, or 
blocks of code, enclose
+                        it with &amp;lt;![CDATA[]]&amp;gt;, which is much 
easier to maintain and
+                        parse in the source files..</p></td></tr><tr 
class="question"><td align="left" valign="top"><a name="d9580e297"></a><a 
name="d9580e298"></a><p><b>A.9.6.</b></p></td><td align="left" 
valign="top"><p>Tips and tricks for making screen output look 
good</p></td></tr><tr class="answer"><td align="left" valign="top"></td><td 
align="left" valign="top"><p>Text within &lt;screen&gt; and 
&lt;programlisting&gt; elements is shown
+                        exactly as it appears in the source, including 
indentation, tabs, and line
+                        wrap.</p><div class="itemizedlist"><ul 
class="itemizedlist" style="list-style-type: disc; "><li 
class="listitem"><p>Indent the starting and closing XML elements, but do not 
indent
+                                the content. Also, to avoid having an extra 
blank line at the
+                                beginning of the programlisting output, do not 
put the CDATA
+                                element on its own line. For example:</p><pre 
class="programlisting">        &lt;programlisting&gt;
+<strong class="hl-keyword">case</strong> $<span class="hl-number">1</span> in
+  --cleanZk|--cleanHdfs|--cleanAll)
+    matches=<strong class="hl-string"><em 
style="color:red">"yes"</em></strong> ;;
+  *) ;;
+<strong class="hl-keyword">esac</strong>
+        &lt;/programlisting&gt;</pre></li><li class="listitem"><p>After 
pasting code into a programlisting, fix the indentation
+                                manually, using two <span 
class="emphasis"><em>spaces</em></span> per desired
+                                indentation. For screen output, be sure to 
include line breaks so
+                                that the text is no longer than 100 
characters.</p></li></ul></div></td></tr><tr class="question"><td align="left" 
valign="top"><a name="d9580e316"></a><a 
name="d9580e317"></a><p><b>A.9.7.</b></p></td><td align="left" 
valign="top"><p>Isolate Changes for Easy Diff Review.</p></td></tr><tr 
class="answer"><td align="left" valign="top"></td><td align="left" 
valign="top"><p>Be careful with pretty-printing or re-formatting an entire XML 
file, even
+                        if the formatting has degraded over time. If you need 
to reformat a file, do
+                        that in a separate JIRA where you do not change any 
content. Be careful
+                        because some XML editors do a bulk-reformat when you 
open a new file,
+                        especially if you use GUI mode in the 
editor.</p></td></tr><tr class="question"><td align="left" valign="top"><a 
name="d9580e323"></a><a name="d9580e324"></a><p><b>A.9.8.</b></p></td><td 
align="left" valign="top"><p>Syntax Highlighting</p></td></tr><tr 
class="answer"><td align="left" valign="top"></td><td align="left" 
valign="top"><p>The HBase Reference Guide uses the <a class="link" 
href="http://sourceforge.net/projects/xslthl/files/xslthl/2.1.0/"; 
target="_top">XSLT Syntax Highlighting</a> Maven module for syntax highlighting.
+                        To enable syntax highlighting for a given 
&lt;programlisting&gt; or
+                        &lt;screen&gt; (or possibly other elements), add the 
attribute
+                                <code class="literal">language=<em 
class="replaceable"><code>LANGUAGE_OF_CHOICE</code></em></code>
+                        to the element, as in the following example:</p><pre 
class="programlisting">
+<strong class="hl-tag" style="color: #000096">&lt;programlisting</strong> 
<span class="hl-attribute" style="color: #F5844C">language</span>=<span 
class="hl-value" style="color: #993300">"xml"</span><strong class="hl-tag" 
style="color: #000096">&gt;</strong>
+    <strong class="hl-tag" style="color: 
#000096">&lt;foo&gt;</strong>bar<strong class="hl-tag" style="color: 
#000096">&lt;/foo&gt;</strong>
+    <strong class="hl-tag" style="color: 
#000096">&lt;bar&gt;</strong>foo<strong class="hl-tag" style="color: 
#000096">&lt;/bar&gt;</strong>
+<strong class="hl-tag" style="color: 
#000096">&lt;/programlisting&gt;</strong></pre><p>Several syntax types are 
supported. The most interesting ones for the
+                        HBase Reference Guide are <code 
class="literal">java</code>, <code class="literal">xml</code>,
+                            <code class="literal">sql</code>, and <code 
class="literal">bourne</code> (for BASH shell
+                        output or Linux command-line 
examples).</p></td></tr></tbody></table></div></div><div 
id="disqus_thread"></div><script type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="apas08.html">Prev</a>&nbsp;</td><td width="20%" 
align="center">&nbsp;</td><td width="40%" align="right">&nbsp;</td></tr><tr><td 
width="40%" align="left" valign="top">A.8.&nbsp;Adding a New Chapter to the 
HBase Reference Guide&nbsp;</td><td width="20%" align="center"><a accesskey="h" 
href="appendix_contributing_to_documentation.html">Home</a></td><td width="40%" 
align="right" valign="top">&nbsp;</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/book/apcs03.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/apcs03.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/apcs03.html (added)
+++ hbase/hbase.apache.org/trunk/book/apcs03.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,39 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>C.3.&nbsp;Localized repairs</title><link rel="stylesheet" 
type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" 
href="hbck.in.depth.html" title="Appendix&nbsp;C.&nbsp;hbck In Depth"><link 
rel="prev" href="apcs02.html" title="C.2.&nbsp;Inconsistencies"><link 
rel="next" href="apcs04.html" title="C.4.&nbsp;Region Overlap 
Repairs"></head><body bgcolor="white" text="black" link="#0000FF" 
vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" 
summary="Navigation header"><tr><th colspan="3" 
align="center">C.3.&nbsp;Localized repairs</th></tr><tr><td width="20%" 
align="left"><a accesskey="p" href="apcs02.html">Prev</a>&nbsp;</td><th 
width="60%" align="center">Appendix&nbsp;C.&nbsp;hbck In Depth</th><td 
width="20%" align="right">&nbsp;<a accesskey="n" href="apcs04.html">Next<
 /a></td></tr></table><hr></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="section"><div class="titlepage"><div><div><h2 
class="title" style="clear: both"><a name="d4029e21069"></a>C.3.&nbsp;Localized 
repairs</h2></div></div></div><p>
+       When repairing a corrupted HBase, it is best to repair the lowest risk 
inconsistencies first.
+These are generally region consistency repairs -- localized single region 
repairs, that only modify
+in-memory data, ephemeral zookeeper data, or patch holes in the META table.
+Region consistency requires that the HBase instance has the state of the 
region&#8217;s data in HDFS
+(.regioninfo files), the region&#8217;s row in the hbase:meta table., and 
region&#8217;s deployment/assignments on
+region servers and the master in accordance. Options for repairing region 
consistency include:
+       </p><div class="itemizedlist"><ul class="itemizedlist" 
style="list-style-type: disc; "><li class="listitem"><p><code 
class="code">-fixAssignments</code> (equivalent to the 0.90 <code 
class="code">-fix</code> option) repairs unassigned, incorrectly
+assigned or multiply assigned regions.</p></li><li class="listitem"><p><code 
class="code">-fixMeta</code> which removes meta rows when corresponding regions 
are not present in
+                 HDFS and adds new meta rows if they regions are present in 
HDFS while not in META.</p></li></ul></div><p>
+       To fix deployment and assignment problems you can run this command:
+</p><pre class="programlisting">
+$ ./bin/hbase hbck -fixAssignments
+</pre><p>To fix deployment and assignment problems as well as repairing 
incorrect meta rows you can
+run this command:</p><pre class="programlisting">
+$ ./bin/hbase hbck -fixAssignments -fixMeta
+</pre><p>There are a few classes of table integrity problems that are low risk 
repairs. The first two are
+degenerate (startkey == endkey) regions and backwards regions (startkey &gt; 
endkey). These are
+automatically handled by sidelining the data to a temporary directory 
(/hbck/xxxx).
+The third low-risk class is hdfs region holes. This can be repaired by using 
the:</p><div class="itemizedlist"><ul class="itemizedlist" 
style="list-style-type: disc; "><li class="listitem"><p><code 
class="code">-fixHdfsHoles</code> option for fabricating new empty regions on 
the file system.
+If holes are detected you can use -fixHdfsHoles and should include -fixMeta 
and -fixAssignments to make the new region consistent.</p></li></ul></div><pre 
class="programlisting">
+$ ./bin/hbase hbck -fixAssignments -fixMeta -fixHdfsHoles
+</pre><p>Since this is a common operation, we&#8217;ve added a the <code 
class="code">-repairHoles</code> flag that is equivalent to the
+previous command:</p><pre class="programlisting">
+$ ./bin/hbase hbck -repairHoles
+</pre><p>If inconsistencies still remain after these steps, you most likely 
have table integrity problems
+related to orphaned or overlapping regions.</p></div><div 
id="disqus_thread"></div><script type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="apcs02.html">Prev</a>&nbsp;</td><td width="20%" align="center"><a 
accesskey="u" href="hbck.in.depth.html">Up</a></td><td width="40%" 
align="right">&nbsp;<a accesskey="n" 
href="apcs04.html">Next</a></td></tr><tr><td width="40%" align="left" 
valign="top">C.2.&nbsp;Inconsistencies&nbsp;</td><td width="20%" 
align="center"><a accesskey="h" href="book.html">Home</a></td><td width="40%" 
align="right" valign="top">&nbsp;C.4.&nbsp;Region Overlap 
Repairs</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/book/apcs04.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/apcs04.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/apcs04.html (added)
+++ hbase/hbase.apache.org/trunk/book/apcs04.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,64 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>C.4.&nbsp;Region Overlap Repairs</title><link rel="stylesheet" 
type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" 
href="hbck.in.depth.html" title="Appendix&nbsp;C.&nbsp;hbck In Depth"><link 
rel="prev" href="apcs03.html" title="C.3.&nbsp;Localized repairs"><link 
rel="next" href="compression.html" title="Appendix&nbsp;D.&nbsp;Compression and 
Data Block Encoding In HBase"></head><body bgcolor="white" text="black" 
link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table 
width="100%" summary="Navigation header"><tr><th colspan="3" 
align="center">C.4.&nbsp;Region Overlap Repairs</th></tr><tr><td width="20%" 
align="left"><a accesskey="p" href="apcs03.html">Prev</a>&nbsp;</td><th 
width="60%" align="center">Appendix&nbsp;C.&nbsp;hbck In Depth</th><td 
width="20%" align="rig
 ht">&nbsp;<a accesskey="n" 
href="compression.html">Next</a></td></tr></table><hr></div><script 
type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="section"><div class="titlepage"><div><div><h2 
class="title" style="clear: both"><a name="d4029e21114"></a>C.4.&nbsp;Region 
Overlap Repairs</h2></div></div></div><p>Table integrity problems can require 
repairs that deal with overlaps. This is a riskier operation
+because it requires modifications to the file system, requires some decision 
making, and may
+require some manual steps. For these repairs it is best to analyze the output 
of a <code class="code">hbck -details</code>
+run so that you isolate repairs attempts only upon problems the checks 
identify. Because this is
+riskier, there are safeguard that should be used to limit the scope of the 
repairs.
+WARNING: This is a relatively new and have only been tested on online but idle 
HBase instances
+(no reads/writes). Use at your own risk in an active production environment!
+The options for repairing table integrity violations include:</p><div 
class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; 
"><li class="listitem"><p><code class="code">-fixHdfsOrphans</code> option for 
&#8220;adopting&#8221; a region directory that is missing a region
+metadata file (the .regioninfo file).</p></li><li class="listitem"><p><code 
class="code">-fixHdfsOverlaps</code> ability for fixing overlapping 
regions</p></li></ul></div><p>When repairing overlapping regions, a 
region&#8217;s data can be modified on the file system in two
+ways: 1) by merging regions into a larger region or 2) by sidelining regions 
by moving data to
+&#8220;sideline&#8221; directory where data could be restored later. Merging a 
large number of regions is
+technically correct but could result in an extremely large region that 
requires series of costly
+compactions and splitting operations. In these cases, it is probably better to 
sideline the regions
+that overlap with the most other regions (likely the largest ranges) so that 
merges can happen on
+a more reasonable scale. Since these sidelined regions are already laid out in 
HBase&#8217;s native
+directory and HFile format, they can be restored by using HBase&#8217;s bulk 
load mechanism.
+The default safeguard thresholds are conservative. These options let you 
override the default
+thresholds and to enable the large region sidelining feature.</p><div 
class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; 
"><li class="listitem"><p><code class="code">-maxMerge &lt;n&gt;</code> maximum 
number of overlapping regions to merge</p></li><li class="listitem"><p><code 
class="code">-sidelineBigOverlaps</code> if more than maxMerge regions are 
overlapping, sideline attempt
+to sideline the regions overlapping with the most other regions.</p></li><li 
class="listitem"><p><code class="code">-maxOverlapsToSideline &lt;n&gt;</code> 
if sidelining large overlapping regions, sideline at most n
+regions.</p></li></ul></div><p>Since often times you would just want to get 
the tables repaired, you can use this option to turn
+on all repair options:</p><div class="itemizedlist"><ul class="itemizedlist" 
style="list-style-type: disc; "><li class="listitem"><p><code 
class="code">-repair</code> includes all the region consistency options and 
only the hole repairing table
+integrity options.</p></li></ul></div><p>Finally, there are safeguards to 
limit repairs to only specific tables. For example the following
+command would only attempt to check and repair table TableFoo and 
TableBar.</p><pre class="screen">
+$ ./bin/hbase hbck -repair TableFoo TableBar
+</pre><div class="section"><div class="titlepage"><div><div><h3 
class="title"><a name="d4029e21163"></a>C.4.1.&nbsp;Special cases: Meta is not 
properly assigned</h3></div></div></div><p>There are a few special cases that 
hbck can handle as well.
+Sometimes the meta table&#8217;s only region is inconsistently assigned or 
deployed. In this case
+there is a special <code class="code">-fixMetaOnly</code> option that can try 
to fix meta assignments.</p><pre class="screen">
+$ ./bin/hbase hbck -fixMetaOnly -fixAssignments
+</pre></div><div class="section"><div class="titlepage"><div><div><h3 
class="title"><a name="d4029e21173"></a>C.4.2.&nbsp;Special cases: HBase 
version file is missing</h3></div></div></div><p>HBase&#8217;s data on the file 
system requires a version file in order to start. If this flie is missing, you
+can use the <code class="code">-fixVersionFile</code> option to fabricating a 
new HBase version file. This assumes that
+the version of hbck you are running is the appropriate version for the HBase 
cluster.</p></div><div class="section"><div class="titlepage"><div><div><h3 
class="title"><a name="d4029e21181"></a>C.4.3.&nbsp;Special case: Root and META 
are corrupt.</h3></div></div></div><p>The most drastic corruption scenario is 
the case where the ROOT or META is corrupted and
+HBase will not start. In this case you can use the OfflineMetaRepair tool 
create new ROOT
+and META regions and tables.
+This tool assumes that HBase is offline. It then marches through the existing 
HBase home
+directory, loads as much information from region metadata files (.regioninfo 
files) as possible
+from the file system. If the region metadata has proper table integrity, it 
sidelines the original root
+and meta table directories, and builds new ones with pointers to the region 
directories and their
+data.</p><pre class="screen">
+$ ./bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
+</pre><p>NOTE: This tool is not as clever as uberhbck but can be used to 
bootstrap repairs that uberhbck
+can complete.
+If the tool succeeds you should be able to start hbase and run online repairs 
if necessary.</p></div><div class="section"><div 
class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21190"></a>C.4.4.&nbsp;Special cases: Offline split 
parent</h3></div></div></div><p>
+Once a region is split, the offline parent will be cleaned up automatically. 
Sometimes, daughter regions
+are split again before their parents are cleaned up. HBase can clean up 
parents in the right order. However,
+there could be some lingering offline split parents sometimes. They are in 
META, in HDFS, and not deployed.
+But HBase can't clean them up. In this case, you can use the <code 
class="code">-fixSplitParents</code> option to reset
+them in META to be online and not split. Therefore, hbck can merge them with 
other regions if fixing
+overlapping regions option is used.
+    </p><p>
+This option should not normally be used, and it is not in <code 
class="code">-fixAll</code>.
+    </p></div></div><div id="disqus_thread"></div><script 
type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="apcs03.html">Prev</a>&nbsp;</td><td width="20%" align="center"><a 
accesskey="u" href="hbck.in.depth.html">Up</a></td><td width="40%" 
align="right">&nbsp;<a accesskey="n" 
href="compression.html">Next</a></td></tr><tr><td width="40%" align="left" 
valign="top">C.3.&nbsp;Localized repairs&nbsp;</td><td width="20%" 
align="center"><a accesskey="h" href="book.html">Home</a></td><td width="40%" 
align="right" valign="top">&nbsp;Appendix&nbsp;D.&nbsp;Compression and Data 
Block Encoding In
+          HBase</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/book/apds02.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/apds02.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/apds02.html (added)
+++ hbase/hbase.apache.org/trunk/book/apds02.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,145 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>D.2.&nbsp;Compressor Configuration, Installation, and 
Use</title><link rel="stylesheet" type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" 
href="compression.html" title="Appendix&nbsp;D.&nbsp;Compression and Data Block 
Encoding In HBase"><link rel="prev" href="compression.html" 
title="Appendix&nbsp;D.&nbsp;Compression and Data Block Encoding In 
HBase"><link rel="next" href="data.block.encoding.enable.html" 
title="D.3.&nbsp;Enable Data Block Encoding"></head><body bgcolor="white" 
text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div 
class="navheader"><table width="100%" summary="Navigation header"><tr><th 
colspan="3" align="center">D.2.&nbsp;Compressor Configuration, Installation, 
and Use</th></tr><tr><td width="20%" align="left"><a accesskey="p" 
href="compression.html">Pre
 v</a>&nbsp;</td><th width="60%" 
align="center">Appendix&nbsp;D.&nbsp;Compression and Data Block Encoding In
+          HBase</th><td width="20%" align="right">&nbsp;<a accesskey="n" 
href="data.block.encoding.enable.html">Next</a></td></tr></table><hr></div><script
 type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="section"><div class="titlepage"><div><div><h2 
class="title" style="clear: both"><a 
name="d4029e21361"></a>D.2.&nbsp;Compressor Configuration, Installation, and 
Use</h2></div></div></div><div class="section"><div 
class="titlepage"><div><div><h3 class="title"><a 
name="compressor.install"></a>D.2.1.&nbsp;Configure HBase For 
Compressors</h3></div></div></div><p>Before HBase can use a given compressor, 
its libraries need to be available. Due to
+          licensing issues, only GZ compression is available to HBase (via 
native Java libraries) in
+          a default installation.</p><div class="section"><div 
class="titlepage"><div><div><h4 class="title"><a 
name="d4029e21369"></a>D.2.1.1.&nbsp;Compressor Support On the 
Master</h4></div></div></div><p>A new configuration setting was introduced in 
HBase 0.95, to check the Master to
+            determine which data block encoders are installed and configured 
on it, and assume that
+            the entire cluster is configured the same. This option,
+              <code class="code">hbase.master.check.compression</code>, 
defaults to <code class="literal">true</code>. This
+            prevents the situation described in <a class="link" 
href="https://issues.apache.org/jira/browse/HBASE-6370"; 
target="_top">HBASE-6370</a>, where
+            a table is created or modified to support a codec that a region 
server does not support,
+            leading to failures that take a long time to occur and are 
difficult to debug. </p><p>If <code 
class="code">hbase.master.check.compression</code> is enabled, libraries for 
all desired
+            compressors need to be installed and configured on the Master, 
even if the Master does
+            not run a region server.</p></div><div class="section"><div 
class="titlepage"><div><div><h4 class="title"><a 
name="d4029e21388"></a>D.2.1.2.&nbsp;Install GZ Support Via Native 
Libraries</h4></div></div></div><p>HBase uses Java's built-in GZip support 
unless the native Hadoop libraries are
+            available on the CLASSPATH. The recommended way to add libraries 
to the CLASSPATH is to
+            set the environment variable <code 
class="envar">HBASE_LIBRARY_PATH</code> for the user running
+            HBase. If native libraries are not available and Java's GZIP is 
used, <code class="literal">Got
+              brand-new compressor</code> reports will be present in the logs. 
See <a class="xref" href="trouble.rs.html#brand.new.compressor" 
title="15.9.2.10.&nbsp;Logs flooded with '2011-01-10 12:40:48,407 INFO 
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor' 
messages">Section&nbsp;15.9.2.10, &#8220;Logs flooded with '2011-01-10 
12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Got
+            brand-new compressor' messages&#8221;</a>).</p></div><div 
class="section"><div class="titlepage"><div><div><h4 class="title"><a 
name="lzo.compression"></a>D.2.1.3.&nbsp;Install LZO 
Support</h4></div></div></div><p>HBase cannot ship with LZO because of 
incompatibility between HBase, which uses an
+            Apache Software License (ASL) and LZO, which uses a GPL license. 
See the <a class="link" 
href="http://wiki.apache.org/hadoop/UsingLzoCompression"; target="_top">Using LZO
+              Compression</a> wiki page for information on configuring LZO 
support for HBase. </p><p>If you depend upon LZO compression, consider 
configuring your RegionServers to fail
+            to start if LZO is not available. See <a class="xref" 
href="apds02.html#hbase.regionserver.codecs" title="D.2.1.7.&nbsp;Enforce 
Compression Settings On a RegionServer">Section&nbsp;D.2.1.7, &#8220;Enforce 
Compression Settings On a RegionServer&#8221;</a>.</p></div><div 
class="section"><div class="titlepage"><div><div><h4 class="title"><a 
name="lz4.compression"></a>D.2.1.4.&nbsp;Configure LZ4 
Support</h4></div></div></div><p>LZ4 support is bundled with Hadoop. Make sure 
the hadoop shared library
+            (libhadoop.so) is accessible when you start
+            HBase. After configuring your platform (see <a class="xref" 
href="">???</a>), you can make a symbolic link from HBase to the native Hadoop
+            libraries. This assumes the two software installs are colocated. 
For example, if my
+            'platform' is Linux-amd64-64:
+            </p><pre class="programlisting">$ <strong 
class="hl-keyword">cd</strong> $HBASE_HOME
+$ mkdir lib/native
+$ ln -s $HADOOP_HOME/lib/native lib/native/Linux-amd64-<span 
class="hl-number">64</span></pre><p>
+            Use the compression tool to check that LZ4 is installed on all 
nodes. Start up (or restart)
+            HBase. Afterward, you can create and alter tables to enable LZ4 as 
a
+            compression codec.:
+            </p><pre class="screen">
+hbase(main):003:0&gt; <strong class="userinput"><code>alter 'TestTable', {NAME 
=&gt; 'info', COMPRESSION =&gt; 'LZ4'}</code></strong>
+            </pre><p>
+          </p></div><div class="section"><div class="titlepage"><div><div><h4 
class="title"><a 
name="snappy.compression.installation"></a>D.2.1.5.&nbsp;Install Snappy 
Support</h4></div></div></div><p>HBase does not ship with Snappy support 
because of licensing issues. You can install
+            Snappy binaries (for instance, by using <span 
class="command"><strong>yum install snappy</strong></span> on CentOS)
+            or build Snappy from source. After installing Snappy, search for 
the shared library,
+            which will be called <code class="filename">libsnappy.so.X</code> 
where X is a number. If you
+            built from source, copy the shared library to a known location on 
your system, such as
+              <code class="filename">/opt/snappy/lib/</code>.</p><p>In 
addition to the Snappy library, HBase also needs access to the Hadoop shared
+            library, which will be called something like <code 
class="filename">libhadoop.so.X.Y</code>,
+            where X and Y are both numbers. Make note of the location of the 
Hadoop library, or copy
+            it to the same location as the Snappy library.</p><div 
class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 
class="title">Note</h3><p>The Snappy and Hadoop libraries need to be available 
on each node of your cluster.
+              See <a class="xref" href="apds02.html#compression.test" 
title="D.2.1.6.&nbsp;CompressionTest">Section&nbsp;D.2.1.6, 
&#8220;CompressionTest&#8221;</a> to find out how to test that this is the 
case.</p><p>See <a class="xref" href="apds02.html#hbase.regionserver.codecs" 
title="D.2.1.7.&nbsp;Enforce Compression Settings On a 
RegionServer">Section&nbsp;D.2.1.7, &#8220;Enforce Compression Settings On a 
RegionServer&#8221;</a> to configure your RegionServers to fail to
+              start if a given compressor is not available.</p></div><p>Each 
of these library locations need to be added to the environment variable
+              <code class="envar">HBASE_LIBRARY_PATH</code> for the operating 
system user that runs HBase. You
+            need to restart the RegionServer for the changes to take 
effect.</p></div><div class="section"><div class="titlepage"><div><div><h4 
class="title"><a 
name="compression.test"></a>D.2.1.6.&nbsp;CompressionTest</h4></div></div></div><p>You
 can use the CompressionTest tool to verify that your compressor is available to
+            HBase:</p><pre class="screen">
+ $ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://<em 
class="replaceable"><code>host/path/to/hbase</code></em> snappy       
+          </pre></div><div class="section"><div 
class="titlepage"><div><div><h4 class="title"><a 
name="hbase.regionserver.codecs"></a>D.2.1.7.&nbsp;Enforce Compression Settings 
On a RegionServer</h4></div></div></div><p>You can configure a RegionServer so 
that it will fail to restart if compression is
+            configured incorrectly, by adding the option 
hbase.regionserver.codecs to the
+              <code class="filename">hbase-site.xml</code>, and setting its 
value to a comma-separated list
+            of codecs that need to be available. For example, if you set this 
property to
+              <code class="literal">lzo,gz</code>, the RegionServer would fail 
to start if both compressors
+            were not available. This would prevent a new server from being 
added to the cluster
+            without having codecs configured properly.</p></div></div><div 
class="section"><div class="titlepage"><div><div><h3 class="title"><a 
name="changing.compression"></a>D.2.2.&nbsp;Enable Compression On a 
ColumnFamily</h3></div></div></div><p>To enable compression for a ColumnFamily, 
use an <code class="code">alter</code> command. You do
+          not need to re-create the table or copy data. If you are changing 
codecs, be sure the old
+          codec is still available until all the old StoreFiles have been 
compacted.</p><div class="example"><a name="d4029e21491"></a><p 
class="title"><b>Example&nbsp;D.1.&nbsp;Enabling Compression on a ColumnFamily 
of an Existing Table using HBase
+            Shell</b></p><div class="example-contents"><pre class="screen">
+hbase&gt; disable 'test'
+hbase&gt; alter 'test', {NAME =&gt; 'cf', COMPRESSION =&gt; 'GZ'}
+hbase&gt; enable 'test'
+        </pre></div></div><br class="example-break"><div class="example"><a 
name="d4029e21496"></a><p class="title"><b>Example&nbsp;D.2.&nbsp;Creating a 
New Table with Compression On a ColumnFamily</b></p><div 
class="example-contents"><pre class="screen">
+hbase&gt; create 'test2', { NAME =&gt; 'cf2', COMPRESSION =&gt; 'SNAPPY' }     
    
+          </pre></div></div><br class="example-break"><div class="example"><a 
name="d4029e21501"></a><p class="title"><b>Example&nbsp;D.3.&nbsp;Verifying a 
ColumnFamily's Compression Settings</b></p><div class="example-contents"><pre 
class="screen">
+hbase&gt; describe 'test'
+DESCRIPTION                                          ENABLED
+ 'test', {NAME =&gt; 'cf', DATA_BLOCK_ENCODING =&gt; 'NONE false
+ ', BLOOMFILTER =&gt; 'ROW', REPLICATION_SCOPE =&gt; '0',
+ VERSIONS =&gt; '1', COMPRESSION =&gt; 'GZ', MIN_VERSIONS
+ =&gt; '0', TTL =&gt; 'FOREVER', KEEP_DELETED_CELLS =&gt; 'fa
+ lse', BLOCKSIZE =&gt; '65536', IN_MEMORY =&gt; 'false', B
+ LOCKCACHE =&gt; 'true'}
+1 row(s) in 0.1070 seconds
+          </pre></div></div><br class="example-break"></div><div 
class="section"><div class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21506"></a>D.2.3.&nbsp;Testing Compression 
Performance</h3></div></div></div><p>HBase includes a tool called LoadTestTool 
which provides mechanisms to test your
+          compression performance. You must specify either <code 
class="literal">-write</code> or
+          <code class="literal">-update-read</code> as your first parameter, 
and if you do not specify another
+        parameter, usage advice is printed for each option.</p><div 
class="example"><a name="d4029e21517"></a><p 
class="title"><b>Example&nbsp;D.4.&nbsp;<span 
class="command">LoadTestTool</span> Usage</b></p><div 
class="example-contents"><pre class="screen">
+$ bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -h            
+usage: bin/hbase org.apache.hadoop.hbase.util.LoadTestTool &lt;options&gt;
+Options:
+ -batchupdate                 Whether to use batch as opposed to separate
+                              updates <strong class="hl-keyword">for</strong> 
every column in a row
+ -bloom &lt;arg&gt;                 Bloom filter <strong 
class="hl-keyword">type</strong>, one of [NONE, ROW, ROWCOL]
+ -compression &lt;arg&gt;           Compression <strong 
class="hl-keyword">type</strong>, one of [LZO, GZ, NONE, SNAPPY,
+                              LZ4]
+ -data_block_encoding &lt;arg&gt;   Encoding algorithm (e.g. prefix 
compression) to
+                              use <strong class="hl-keyword">for</strong> data 
blocks in the <strong class="hl-keyword">test</strong> column family, one
+                              of [NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE].
+ -encryption &lt;arg&gt;            Enables transparent encryption on the 
<strong class="hl-keyword">test</strong> table,
+                              one of [AES]
+ -generator &lt;arg&gt;             The class which generates load <strong 
class="hl-keyword">for</strong> the tool. Any
+                              args <strong class="hl-keyword">for</strong> 
this class can be passed as colon
+                              separated after class name
+ -h,--help                    Show usage
+ -in_memory                   Tries to keep the HFiles of the CF inmemory as 
far
+                              as possible.  Not guaranteed that reads are 
always
+                              served from inmemory
+ -init_only                   Initialize the <strong 
class="hl-keyword">test</strong> table only, don<strong class="hl-string"><em 
style="color:red">'t do any
+                              loading
+ -key_window &lt;arg&gt;            The '</em></strong>key window<strong 
class="hl-string"><em style="color:red">' to maintain between reads and
+                              writes for concurrent write/read workload. The
+                              default is 0.
+ -max_read_errors &lt;arg&gt;       The maximum number of read errors to 
tolerate
+                              before terminating all reader threads. The 
default
+                              is 10.
+ -multiput                    Whether to use multi-puts as opposed to separate
+                              puts for every column in a row
+ -num_keys &lt;arg&gt;              The number of keys to read/write
+ -num_tables &lt;arg&gt;            A positive integer number. When a number n 
is
+                              speicfied, load test tool  will load n table
+                              parallely. -tn parameter value becomes table name
+                              prefix. Each table name is in format
+                              &lt;tn&gt;_1...&lt;tn&gt;_n
+ -read &lt;arg&gt;                  
&lt;verify_percent&gt;[:&lt;#threads=20&gt;]
+ -regions_per_server &lt;arg&gt;    A positive integer number. When a number n 
is
+                              specified, load test tool will create the test
+                              table with n regions per server
+ -skip_init                   Skip the initialization; assume test table 
already
+                              exists
+ -start_key &lt;arg&gt;             The first key to read/write (a 0-based 
index). The
+                              default value is 0.
+ -tn &lt;arg&gt;                    The name of the table to read or write
+ -update &lt;arg&gt;                
&lt;update_percent&gt;[:&lt;#threads=20&gt;][:&lt;#whether to
+                              ignore nonce collisions=0&gt;]
+ -write &lt;arg&gt;                 
&lt;avg_cols_per_key&gt;:&lt;avg_data_size&gt;[:&lt;#threads=20&gt;]
+ -zk &lt;arg&gt;                    ZK quorum as comma-separated host names 
without
+                              port numbers
+ -zk_root &lt;arg&gt;               name of parent znode in zookeeper          
  
+          </em></strong></pre></div></div><br class="example-break"><div 
class="example"><a name="d4029e21524"></a><p 
class="title"><b>Example&nbsp;D.5.&nbsp;Example Usage of 
LoadTestTool</b></p><div class="example-contents"><pre class="screen">
+$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write <span 
class="hl-number">1</span>:<span class="hl-number">10</span>:<span 
class="hl-number">100</span> -num_keys <span class="hl-number">1000000</span>
+          -<strong class="hl-keyword">read</strong> <span 
class="hl-number">100</span>:<span class="hl-number">30</span> -num_tables 
<span class="hl-number">1</span> -data_block_encoding NONE -tn 
load_test_tool_NONE
+          </pre></div></div><br class="example-break"></div></div><div 
id="disqus_thread"></div><script type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="compression.html">Prev</a>&nbsp;</td><td width="20%" align="center"><a 
accesskey="u" href="compression.html">Up</a></td><td width="40%" 
align="right">&nbsp;<a accesskey="n" 
href="data.block.encoding.enable.html">Next</a></td></tr><tr><td width="40%" 
align="left" valign="top">Appendix&nbsp;D.&nbsp;Compression and Data Block 
Encoding In
+          HBase&nbsp;</td><td width="20%" align="center"><a accesskey="h" 
href="book.html">Home</a></td><td width="40%" align="right" 
valign="top">&nbsp;D.3.&nbsp;Enable Data Block 
Encoding</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/book/ape.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/ape.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/ape.html (added)
+++ hbase/hbase.apache.org/trunk/book/ape.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,15 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>Appendix&nbsp;E.&nbsp;YCSB: The Yahoo! Cloud Serving Benchmark and 
HBase</title><link rel="stylesheet" type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link 
rel="prev" href="data.block.encoding.enable.html" title="D.3.&nbsp;Enable Data 
Block Encoding"><link rel="next" href="hfilev2.html" 
title="Appendix&nbsp;F.&nbsp;HFile format version 2"></head><body 
bgcolor="white" text="black" link="#0000FF" vlink="#840084" 
alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation 
header"><tr><th colspan="3" align="center">Appendix&nbsp;E.&nbsp;YCSB: The 
Yahoo! Cloud Serving Benchmark and HBase</th></tr><tr><td width="20%" 
align="left"><a accesskey="p" 
href="data.block.encoding.enable.html">Prev</a>&nbsp;</td>
 <th width="60%" align="center">&nbsp;</th><td width="20%" 
align="right">&nbsp;<a accesskey="n" 
href="hfilev2.html">Next</a></td></tr></table><hr></div><script 
type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="appendix"><div class="titlepage"><div><div><h1 
class="title"><a name="d4029e21547"></a>Appendix&nbsp;E.&nbsp;<a class="link" 
href="https://github.com/brianfrankcooper/YCSB/"; target="_top">YCSB: The Yahoo! 
Cloud Serving Benchmark</a> and HBase</h1></div></div></div><p>TODO: Describe 
how YCSB is poor for putting up a decent cluster load.</p><p>TODO: Describe 
setup of YCSB for HBase.  In particular, presplit your tables before you start
+          a run.  See <a class="link" 
href="https://issues.apache.org/jira/browse/HBASE-4163"; 
target="_top">HBASE-4163 Create Split Strategy for YCSB Benchmark</a>
+          for why and a little shell command for how to do it.</p><p>Ted 
Dunning redid YCSB so it's mavenized and added facility for verifying 
workloads.  See <a class="link" href="https://github.com/tdunning/YCSB"; 
target="_top">Ted Dunning's YCSB</a>.</p></div><div 
id="disqus_thread"></div><script type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="data.block.encoding.enable.html">Prev</a>&nbsp;</td><td width="20%" 
align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" 
href="hfilev2.html">Next</a></td></tr><tr><td width="40%" align="left" 
valign="top">D.3.&nbsp;Enable Data Block Encoding&nbsp;</td><td width="20%" 
align="center"><a accesskey="h" href="book.html">Home</a></td><td width="40%" 
align="right" valign="top">&nbsp;Appendix&nbsp;F.&nbsp;HFile format version 
2</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/book/apfs02.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/apfs02.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/apfs02.html (added)
+++ hbase/hbase.apache.org/trunk/book/apfs02.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,21 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>F.2.&nbsp;HFile format version 1 overview</title><link 
rel="stylesheet" type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" 
href="hfilev2.html" title="Appendix&nbsp;F.&nbsp;HFile format version 2"><link 
rel="prev" href="hfilev2.html" title="Appendix&nbsp;F.&nbsp;HFile format 
version 2"><link rel="next" href="apfs03.html" title="F.3.&nbsp; HBase file 
format with inline blocks (version 2)"></head><body bgcolor="white" 
text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div 
class="navheader"><table width="100%" summary="Navigation header"><tr><th 
colspan="3" align="center">F.2.&nbsp;HFile format version 1 overview 
</th></tr><tr><td width="20%" align="left"><a accesskey="p" 
href="hfilev2.html">Prev</a>&nbsp;</td><th width="60%" 
align="center">Appendix&nbsp;F.&nbsp;HFile format ve
 rsion 2</th><td width="20%" align="right">&nbsp;<a accesskey="n" 
href="apfs03.html">Next</a></td></tr></table><hr></div><script 
type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="section"><div class="titlepage"><div><div><h2 
class="title" style="clear: both"><a name="d4029e21580"></a>F.2.&nbsp;HFile 
format version 1 overview </h2></div></div></div><p>As we will be discussing 
the changes we are making to the HFile format, it is useful to give a short 
overview of the previous (HFile version 1) format. An HFile in the existing 
format is structured as follows:
+           <span class="inlinemediaobject"><img 
src="/Users/stack/checkouts/hbase.git.commit/target/docbkx/book/images/hfile.png"
 align="middle" alt="HFile Version 1"></span>
+           <a href="#ftn.d4029e21592" class="footnote" name="d4029e21592"><sup 
class="footnote">[34]</sup></a>
+       </p><div class="section"><div class="titlepage"><div><div><h3 
class="title"><a name="d4029e21599"></a>F.2.1.&nbsp; Block index format in 
version 1 </h3></div></div></div><p>The block index in version 1 is very 
straightforward. For each entry, it contains: </p><div class="orderedlist"><ol 
class="orderedlist" type="1"><li class="listitem"><p>Offset (long)</p></li><li 
class="listitem"><p>Uncompressed size (int)</p></li><li class="listitem"><p>Key 
(a serialized byte array written using Bytes.writeByteArray) </p><div 
class="orderedlist"><ol class="orderedlist" type="a"><li 
class="listitem"><p>Key length as a variable-length integer (VInt)
+                  </p></li><li class="listitem"><p>
+                     Key bytes
+                 </p></li></ol></div></li></ol></div><p>The number of entries 
in the block index is stored in the fixed file trailer, and has to be passed in 
to the method that reads the block index. One of the limitations of the block 
index in version 1 is that it does not provide the compressed size of a block, 
which turns out to be necessary for decompression. Therefore, the HFile reader 
has to infer this compressed size from the offset difference between blocks. We 
fix this limitation in version 2, where we store on-disk block size instead of 
uncompressed size, and get uncompressed size from the block 
header.</p></div><div class="footnotes"><br><hr style="width:100; 
text-align:left;margin-left: 0"><div id="ftn.d4029e21592" 
class="footnote"><p><a href="#d4029e21592" class="para"><sup class="para">[34] 
</sup></a>Image courtesy of Lars George, <a class="link" 
href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html"; 
target="_top">hbase-architecture-101-storage.ht
 ml</a>.</p></div></div></div><div id="disqus_thread"></div><script 
type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="hfilev2.html">Prev</a>&nbsp;</td><td width="20%" align="center"><a 
accesskey="u" href="hfilev2.html">Up</a></td><td width="40%" 
align="right">&nbsp;<a accesskey="n" 
href="apfs03.html">Next</a></td></tr><tr><td width="40%" align="left" 
valign="top">Appendix&nbsp;F.&nbsp;HFile format version 2&nbsp;</td><td 
width="20%" align="center"><a accesskey="h" href="book.html">Home</a></td><td 
width="40%" align="right" valign="top">&nbsp;F.3.&nbsp;
+      HBase file format with inline blocks (version 2)
+      </td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/book/apfs03.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/apfs03.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/apfs03.html (added)
+++ hbase/hbase.apache.org/trunk/book/apfs03.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,145 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>F.3.&nbsp; HBase file format with inline blocks (version 
2)</title><link rel="stylesheet" type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" 
href="hfilev2.html" title="Appendix&nbsp;F.&nbsp;HFile format version 2"><link 
rel="prev" href="apfs02.html" title="F.2.&nbsp;HFile format version 1 
overview"><link rel="next" href="other.info.html" 
title="Appendix&nbsp;G.&nbsp;Other Information About HBase"></head><body 
bgcolor="white" text="black" link="#0000FF" vlink="#840084" 
alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation 
header"><tr><th colspan="3" align="center">F.3.&nbsp;
+      HBase file format with inline blocks (version 2)
+      </th></tr><tr><td width="20%" align="left"><a accesskey="p" 
href="apfs02.html">Prev</a>&nbsp;</td><th width="60%" 
align="center">Appendix&nbsp;F.&nbsp;HFile format version 2</th><td width="20%" 
align="right">&nbsp;<a accesskey="n" 
href="other.info.html">Next</a></td></tr></table><hr></div><script 
type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="section"><div class="titlepage"><div><div><h2 
class="title" style="clear: both"><a name="d4029e21623"></a>F.3.&nbsp;
+      HBase file format with inline blocks (version 2)
+      </h2></div></div></div><div class="section"><div 
class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21626"></a>F.3.1.&nbsp; Overview</h3></div></div></div><p>The 
version of HBase introducing the above features reads both version 1 and 2 
HFiles, but only writes version 2 HFiles. A version 2 HFile is structured as 
follows:
+           <span class="inlinemediaobject"><img 
src="/Users/stack/checkouts/hbase.git.commit/target/docbkx/book/images/hfilev2.png"
 align="middle" alt="HFile Version 2"></span>
+
+   </p></div><div class="section"><div class="titlepage"><div><div><h3 
class="title"><a name="d4029e21638"></a>F.3.2.&nbsp;Unified version 2 block 
format</h3></div></div></div><p>In the version 2 every block in the data 
section contains the following fields: </p><div class="orderedlist"><ol 
class="orderedlist" type="1"><li class="listitem"><p>8 bytes: Block type, a 
sequence of bytes equivalent to version 1's "magic records". Supported block 
types are: </p><div class="orderedlist"><ol class="orderedlist" type="a"><li 
class="listitem"><p>DATA &#8211; data blocks
+                  </p></li><li class="listitem"><p>
+                     LEAF_INDEX &#8211; leaf-level index blocks in a 
multi-level-block-index
+                 </p></li><li class="listitem"><p>
+                     BLOOM_CHUNK &#8211; Bloom filter chunks
+                  </p></li><li class="listitem"><p>
+                     META &#8211; meta blocks (not used for Bloom filters in 
version 2 anymore)
+                  </p></li><li class="listitem"><p>
+                     INTERMEDIATE_INDEX &#8211; intermediate-level index 
blocks in a multi-level blockindex
+                  </p></li><li class="listitem"><p>
+                     ROOT_INDEX &#8211; root&gt;level index blocks in a 
multi&gt;level block index
+                  </p></li><li class="listitem"><p>
+                     FILE_INFO &#8211; the &#8220;file info&#8221; block, a 
small key&gt;value map of metadata
+                  </p></li><li class="listitem"><p>
+                     BLOOM_META &#8211; a Bloom filter metadata block in the 
load&gt;on&gt;open section
+                  </p></li><li class="listitem"><p>
+                     TRAILER &#8211; a fixed&gt;size file trailer. As opposed 
to the above, this is not an
+                     HFile v2 block but a fixed&gt;size (for each HFile 
version) data structure
+                  </p></li><li class="listitem"><p>
+                      INDEX_V1 &#8211; this block type is only used for legacy 
HFile v1 block
+                  </p></li></ol></div></li><li class="listitem"><p>Compressed 
size of the block's data, not including the header (int).
+         </p><p>
+Can be used for skipping the current data block when scanning HFile data.
+                  </p></li><li class="listitem"><p>Uncompressed size of the 
block's data, not including the header (int)</p><p>
+ This is equal to the compressed size if the compression algorithm is NON
+                  </p></li><li class="listitem"><p>File offset of the previous 
block of the same type (long)</p><p>
+ Can be used for seeking to the previous data/index block
+                  </p></li><li class="listitem"><p>Compressed data (or 
uncompressed data if the compression algorithm is 
NONE).</p></li></ol></div><p>The above format of blocks is used in the 
following HFile sections:</p><div class="orderedlist"><ol class="orderedlist" 
type="1"><li class="listitem"><p>Scanned block section. The section is named so 
because it contains all data blocks that need to be read when an HFile is 
scanned sequentially. &nbsp;Also contains leaf block index and Bloom chunk 
blocks. </p></li><li class="listitem"><p>Non-scanned block section. This 
section still contains unified-format v2 blocks but it does not have to be read 
when doing a sequential scan. This section contains &#8220;meta&#8221; blocks 
and intermediate-level index blocks.
+         </p></li></ol></div><p>We are supporting &#8220;meta&#8221; blocks in 
version 2 the same way they were supported in version 1, even though we do not 
store Bloom filter data in these blocks anymore. </p></div><div 
class="section"><div class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21707"></a>F.3.3.&nbsp; Block index in version 
2</h3></div></div></div><p>There are three types of block indexes in HFile 
version 2, stored in two different formats (root and non-root): </p><div 
class="orderedlist"><ol class="orderedlist" type="1"><li 
class="listitem"><p>Data index &#8212; version 2 multi-level block index, 
consisting of:</p><div class="orderedlist"><ol class="orderedlist" type="a"><li 
class="listitem"><p>
+ Version 2 root index, stored in the data block index section of the file
+             </p></li><li class="listitem"><p>
+Optionally, version 2 intermediate levels, stored in the non%root format in   
the data index section of the file.    Intermediate levels can only be present 
if leaf level blocks are present
+             </p></li><li class="listitem"><p>
+Optionally, version 2 leaf levels, stored in the non%root format inline with   
data blocks
+             </p></li></ol></div></li><li class="listitem"><p>Meta index 
&#8212; version 2 root index format only, stored in the meta index section of 
the file</p></li><li class="listitem"><p>Bloom index &#8212; version 2 root 
index format only, stored in the &#8220;load-on-open&#8221; section as part of 
Bloom filter metadata.</p></li></ol></div></div><div class="section"><div 
class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21732"></a>F.3.4.&nbsp;
+      Root block index format in version 2</h3></div></div></div><p>This 
format applies to:</p><div class="orderedlist"><ol class="orderedlist" 
type="1"><li class="listitem"><p>Root level of the version 2 data 
index</p></li><li class="listitem"><p>Entire meta and Bloom indexes in version 
2, which are always single-level. </p></li></ol></div><p>A version 2 root index 
block is a sequence of entries of the following format, similar to entries of a 
version 1 block index, but storing on-disk size instead of uncompressed size. 
</p><div class="orderedlist"><ol class="orderedlist" type="1"><li 
class="listitem"><p>Offset (long) </p><p>
+This offset may point to a data block or to a deeper&gt;level index block.
+             </p></li><li class="listitem"><p>On-disk size (int) </p></li><li 
class="listitem"><p>Key (a serialized byte array stored using 
Bytes.writeByteArray) </p><div class="orderedlist"><ol class="orderedlist" 
type="a"><li class="listitem"><p>Key (VInt)
+             </p></li><li class="listitem"><p>Key bytes
+             </p></li></ol></div></li></ol></div><p>A single-level version 2 
block index consists of just a single root index block. To read a root index 
block of version 2, one needs to know the number of entries. For the data index 
and the meta index the number of entries is stored in the trailer, and for the 
Bloom index it is stored in the compound Bloom filter metadata.</p><p>For a 
multi-level block index we also store the following fields in the root index 
block in the load-on-open section of the HFile, in addition to the data 
structure described above:</p><div class="orderedlist"><ol class="orderedlist" 
type="1"><li class="listitem"><p>Middle leaf index block offset</p></li><li 
class="listitem"><p>Middle leaf block on-disk size (meaning the leaf index 
block containing the reference to the &#8220;middle&#8221; data block of the 
file) </p></li><li class="listitem"><p>The index of the mid-key (defined below) 
in the middle leaf-level block.</p></li></ol></div><p></p><p>These addit
 ional fields are used to efficiently retrieve the mid-key of the HFile used in 
HFile splits, which we define as the first key of the block with a zero-based 
index of (n &#8211; 1) / 2, if the total number of blocks in the HFile is n. 
This definition is consistent with how the mid-key was determined in HFile 
version 1, and is reasonable in general, because blocks are likely to be the 
same size on average, but we don&#8217;t have any estimates on individual 
key/value pair sizes. </p><p></p><p>When writing a version 2 HFile, the total 
number of data blocks pointed to by every leaf-level index block is kept track 
of. When we finish writing and the total number of leaf-level blocks is 
determined, it is clear which leaf-level block contains the mid-key, and the 
fields listed above are computed. &nbsp;When reading the HFile and the mid-key 
is requested, we retrieve the middle leaf index block (potentially from the 
block cache) and get the mid-key value from the appropriate position inside 
 that leaf block.</p></div><div class="section"><div 
class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21785"></a>F.3.5.&nbsp;
+      Non-root block index format in version 2</h3></div></div></div><p>This 
format applies to intermediate-level and leaf index blocks of a version 2 
multi-level data block index. Every non-root index block is structured as 
follows. </p><div class="orderedlist"><ol class="orderedlist" type="1"><li 
class="listitem"><p>numEntries: the number of entries (int). </p></li><li 
class="listitem"><p>entryOffsets: the &#8220;secondary index&#8221; of offsets 
of entries in the block, to facilitate a quick binary search on the key 
(numEntries + 1 int values). The last value is the total length of all entries 
in this index block. For example, in a non-root index block with entry sizes 
60, 80, 50 the &#8220;secondary index&#8221; will contain the following int 
array: {0, 60, 140, 190}.</p></li><li class="listitem"><p>Entries. Each entry 
contains: </p><div class="orderedlist"><ol class="orderedlist" type="a"><li 
class="listitem"><p>
+Offset of the block referenced by this entry in the file (long)
+             </p></li><li class="listitem"><p>
+On&gt;disk size of the referenced block (int)
+             </p></li><li class="listitem"><p>
+Key. The length can be calculated from entryOffsets.
+             </p></li></ol></div></li></ol></div></div><div 
class="section"><div class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21810"></a>F.3.6.&nbsp;
+      Bloom filters in version 2</h3></div></div></div><p>In contrast with 
version 1, in a version 2 HFile Bloom filter metadata is stored in the 
load-on-open section of the HFile for quick startup. </p><div 
class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>A 
compound Bloom filter. </p><div class="orderedlist"><ol class="orderedlist" 
type="a"><li class="listitem"><p>
+ Bloom filter version = 3 (int). There used to be a DynamicByteBloomFilter 
class that had the Bloom   filter version number 2
+             </p></li><li class="listitem"><p>
+The total byte size of all compound Bloom filter chunks (long)
+             </p></li><li class="listitem"><p>
+ Number of hash functions (int
+             </p></li><li class="listitem"><p>
+Type of hash functions (int)
+             </p></li><li class="listitem"><p>
+The total key count inserted into the Bloom filter (long)
+             </p></li><li class="listitem"><p>
+The maximum total number of keys in the Bloom filter (long)
+             </p></li><li class="listitem"><p>
+The number of chunks (int)
+             </p></li><li class="listitem"><p>
+Comparator class used for Bloom filter keys, a UTF&gt;8 encoded string stored  
 using Bytes.writeByteArray
+             </p></li><li class="listitem"><p>
+ Bloom block index in the version 2 root block index format
+             </p></li></ol></div></li></ol></div></div><div 
class="section"><div class="titlepage"><div><div><h3 class="title"><a 
name="d4029e21847"></a>F.3.7.&nbsp;File Info format in versions 1 and 
2</h3></div></div></div><p>The file info block is a serialized <a class="link" 
href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/HbaseMapWritable.html";
 target="_top">HbaseMapWritable</a> (essentially a map from byte arrays to byte 
arrays) with the following keys, among others. StoreFile-level logic adds more 
keys to this.</p><div class="informaltable"><table 
border="1"><colgroup><col><col></colgroup><tbody><tr><td>
+               <p>hfile.LASTKEY </p>
+            </td><td>
+               <p>The last key of the file (byte array) </p>
+            </td></tr><tr><td>
+               <p>hfile.AVG_KEY_LEN </p>
+            </td><td>
+               <p>The average key length in the file (int) </p>
+            </td></tr><tr><td>
+               <p>hfile.AVG_VALUE_LEN </p>
+            </td><td>
+               <p>The average value length in the file (int) </p>
+            </td></tr></tbody></table></div><p>File info format did not change 
in version 2. However, we moved the file info to the final section of the file, 
which can be loaded as one block at the time the HFile is being opened. Also, 
we do not store comparator in the version 2 file info anymore. Instead, we 
store it in the fixed file trailer. This is because we need to know the 
comparator at the time of parsing the load-on-open section of the 
HFile.</p></div><div class="section"><div class="titlepage"><div><div><h3 
class="title"><a name="d4029e21893"></a>F.3.8.&nbsp;
+      Fixed file trailer format differences between versions 1 and 
2</h3></div></div></div><p>The following table shows common and different 
fields between fixed file trailers in versions 1 and 2. Note that the size of 
the trailer is different depending on the version, so it is &#8220;fixed&#8221; 
only within one version. However, the version is always stored as the last 
four-byte integer in the file. </p><p></p><div class="informaltable"><table 
border="1"><colgroup><col class="c1"><col class="c2"></colgroup><tbody><tr><td>
+               <p>Version 1 </p>
+            </td><td>
+               <p>Version 2 </p>
+            </td></tr><tr><td colspan="2" align="center">
+               <p>File info offset (long) </p>
+            </td></tr><tr><td>
+               <p>Data index offset (long) </p>
+            </td><td>
+                <p>loadOnOpenOffset (long)</p>
+                <p><span class="emphasis"><em>The offset of the section that 
we need toload when opening the file.</em></span></p>
+            </td></tr><tr><td colspan="2" align="center">
+               <p>Number of data index entries (int) </p>
+            </td></tr><tr><td>
+               <p>metaIndexOffset (long)</p>
+               <p>This field is not being used by the version 1 reader, so we 
removed it from version 2.</p>
+            </td><td>
+               <p>uncompressedDataIndexSize (long)</p>
+               <p>The total uncompressed size of the whole data block index, 
including root-level, intermediate-level, and leaf-level blocks.</p>
+            </td></tr><tr><td colspan="2" align="center">
+               <p>Number of meta index entries (int) </p>
+            </td></tr><tr><td colspan="2" align="center">
+               <p>Total uncompressed bytes (long) </p>
+            </td></tr><tr><td>
+               <p>numEntries (int) </p>
+            </td><td>
+               <p>numEntries (long) </p>
+            </td></tr><tr><td colspan="2" align="center">
+               <p>Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int) </p>
+            </td></tr><tr><td>
+               <p></p>
+            </td><td>
+               <p>The number of levels in the data block index (int) </p>
+            </td></tr><tr><td>
+               <p></p>
+            </td><td>
+               <p>firstDataBlockOffset (long)</p>
+               <p>The offset of the first first data block. Used when 
scanning. </p>
+            </td></tr><tr><td>
+               <p></p>
+            </td><td>
+               <p>lastDataBlockEnd (long)</p>
+               <p>The offset of the first byte after the last key/value data 
block. We don't need to go beyond this offset when scanning. </p>
+            </td></tr><tr><td>
+               <p>Version: 1 (int) </p>
+            </td><td>
+               <p>Version: 2 (int) </p>
+            </td></tr></tbody></table></div><p></p></div><div 
class="section"><div class="titlepage"><div><div><h3 class="title"><a 
name="d4029e22036"></a>F.3.9.&nbsp;getShortMidpointKey(an optimization for data 
index block)</h3></div></div></div><p>Note: this optimization was introduced in 
HBase 0.95+</p><p>HFiles contain many blocks that contain a range of sorted 
Cells. Each cell has a key. To save IO when reading Cells, the HFile also has 
an index that maps a Cell's start key to the offset of the beginning of a 
particular block. Prior to this optimization, HBase would use the key of the 
first cell in each data block as the index key.</p><p>In HBASE-7845, we 
generate a new key that is lexicographically larger than the last key of the 
previous block and lexicographically equal or smaller than the start key of the 
current block. While actual keys can potentially be very long, this "fake key" 
or "virtual key" can be much shorter. For example, if the stop key of previous 
block is "the
  quick brown fox", the start key of current block is "the who", we could use 
"the r" as our virtual key in our hfile index.</p><p>There are two benefits to 
this:</p><div class="itemizedlist"><ul class="itemizedlist" 
style="list-style-type: disc; "><li class="listitem"><p>having shorter keys 
reduces the hfile index size, (allowing us to keep more indexes in memory), 
and</p></li><li class="listitem"><p>using something closer to the end key of 
the previous block allows us to avoid a potential extra IO when the target key 
lives in between the "virtual key" and the key of the first element in the 
target block.</p></li></ul></div><p>This optimization (implemented by the 
getShortMidpointKey method) is inspired by LevelDB's 
ByteWiseComparatorImpl::FindShortestSeparator() and 
FindShortSuccessor().</p></div></div><div id="disqus_thread"></div><script 
type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="apfs02.html">Prev</a>&nbsp;</td><td width="20%" align="center"><a 
accesskey="u" href="hfilev2.html">Up</a></td><td width="40%" 
align="right">&nbsp;<a accesskey="n" 
href="other.info.html">Next</a></td></tr><tr><td width="40%" align="left" 
valign="top">F.2.&nbsp;HFile format version 1 overview &nbsp;</td><td 
width="20%" align="center"><a accesskey="h" href="book.html">Home</a></td><td 
width="40%" align="right" valign="top">&nbsp;Appendix&nbsp;G.&nbsp;Other 
Information About HBase</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/book/apks02.html
URL: 
http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/apks02.html?rev=1616896&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book/apks02.html (added)
+++ hbase/hbase.apache.org/trunk/book/apks02.html Fri Aug  8 22:19:16 2014
@@ -0,0 +1,21 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>K.2.&nbsp;TODO</title><link rel="stylesheet" type="text/css" 
href="${baserdir}/src/main/site/resources/css/freebsd_docbook.css"><meta 
name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" 
href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" 
href="hbase.rpc.html" title="Appendix&nbsp;K.&nbsp;0.95 RPC 
Specification"><link rel="prev" href="hbase.rpc.html" 
title="Appendix&nbsp;K.&nbsp;0.95 RPC Specification"><link rel="next" 
href="apks03.html" title="K.3.&nbsp;RPC"></head><body bgcolor="white" 
text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div 
class="navheader"><table width="100%" summary="Navigation header"><tr><th 
colspan="3" align="center">K.2.&nbsp;TODO</th></tr><tr><td width="20%" 
align="left"><a accesskey="p" href="hbase.rpc.html">Prev</a>&nbsp;</td><th 
width="60%" align="center">Appendix&nbsp;K.&nbsp;0.95 RPC Specification</th><td 
width="20%" align="right">&nbsp;<a accesskey="n" href="apks03.html">Next</a></t
 d></tr></table><hr></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your 
forum shortname
+    var disqus_url = 'http://hbase.apache.org/book/.html';
+    </script><div class="section"><div class="titlepage"><div><div><h2 
class="title" style="clear: both"><a 
name="d4029e22378"></a>K.2.&nbsp;TODO</h2></div></div></div><p>
+            </p><div class="orderedlist"><ol class="orderedlist" type="1"><li 
class="listitem"><p>List of problems with currently specified format and where 
we would like
+                        to go in a version2, etc. For example, what would we 
have to change if
+                        anything to move server async or to support 
streaming/chunking?</p></li><li class="listitem"><p>Diagram on how it 
works</p></li><li class="listitem"><p>A grammar that succinctly describes the 
wire-format. Currently we have
+                        these words and the content of the rpc protobuf idl 
but a grammar for the
+                        back and forth would help with groking rpc. Also, a 
little state machine on
+                        client/server interactions would help with 
understanding (and ensuring
+                        correct implementation).</p></li></ol></div><p>
+        </p></div><div id="disqus_thread"></div><script type="text/javascript">
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 
'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || 
document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a 
href="http://disqus.com/?ref_noscript";>comments powered by 
Disqus.</a></noscript><a href="http://disqus.com"; class="dsq-brlink">comments 
powered by <span class="logo-disqus">Disqus</span></a><div 
class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td 
width="40%" align="left"><a accesskey="p" 
href="hbase.rpc.html">Prev</a>&nbsp;</td><td width="20%" align="center"><a 
accesskey="u" href="hbase.rpc.html">Up</a></td><td width="40%" 
align="right">&nbsp;<a accesskey="n" 
href="apks03.html">Next</a></td></tr><tr><td width="40%" align="left" 
valign="top">Appendix&nbsp;K.&nbsp;0.95 RPC Specification&nbsp;</td><td 
width="20%" align="center"><a accesskey="h" href="book.html">Home</a></td><td 
width="40%" align="right" 
valign="top">&nbsp;K.3.&nbsp;RPC</td></tr></table></div></body></html>
\ No newline at end of file


Reply via email to