Author: dmeil
Date: Wed Feb 15 19:08:05 2012
New Revision: 1244649

URL: http://svn.apache.org/viewvc?rev=1244649&view=rev
Log:
hbase-5404.  book.xml, performance.xml - more info on compression and schema 
design

Modified:
    hbase/trunk/src/docbkx/book.xml
    hbase/trunk/src/docbkx/performance.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: 
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1244649&r1=1244648&r2=1244649&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Wed Feb 15 19:08:05 2012
@@ -648,15 +648,17 @@ admin.enableTable(table);               
        <para>Most of the time small inefficiencies don't matter all that much. 
 Unfortunately,
          this is a case where they do.  Whatever patterns are selected for 
ColumnFamilies, attributes, and rowkeys they could be repeated
        several billion times in your data. </para>
-       <para>See <xref linkend="keyvalue"/> for more information on HBase 
stores data internally.</para>
+       <para>See <xref linkend="keyvalue"/> for more information on HBase 
stores data internally to see why this is important.</para>
        <section xml:id="keysize.cf"><title>Column Families</title>
          <para>Try to keep the ColumnFamily names as small as possible, 
preferably one character (e.g. "d" for data/default).
          </para> 
+       <para>See <xref linkend="keyvalue"/> for more information on HBase 
stores data internally to see why this is important.</para>
        </section>
        <section xml:id="keysize.atttributes"><title>Attributes</title>
          <para>Although verbose attribute names (e.g., 
"myVeryImportantAttribute") are easier to read, prefer shorter attribute names 
(e.g., "via")
          to store in HBase.
          </para> 
+       <para>See <xref linkend="keyvalue"/> for more information on HBase 
stores data internally to see why this is important.</para>
        </section>
        <section xml:id="keysize.row"><title>Rowkey Length</title>
          <para>Keep them as short as is reasonable such that they can still be 
useful for required data access (e.g., Get vs. Scan). 
@@ -692,6 +694,7 @@ System.out.println("md5 digest as string
 </programlisting>               
          </para>
        </section>
+       
     </section>
     <section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
     <para>A common problem in database processing is quickly finding the most 
recent version of a value.  A technique using reverse timestamps
@@ -888,7 +891,7 @@ System.out.println("md5 digest as string
   </section>
   <section xml:id="schema.ops"><title>Operational and Performance 
Configuration Options</title>
     <para>See the Performance section <xref linkend="perf.schema"/> for more 
information operational and performance
-    schema design options, such as Bloom Filters, Table-configured 
regionsizes, and blocksizes.
+    schema design options, such as Bloom Filters, Table-configured 
regionsizes, compression, and blocksizes.
     </para>
   </section>  
 

Modified: hbase/trunk/src/docbkx/performance.xml
URL: 
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1244649&r1=1244648&r2=1244649&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Wed Feb 15 19:08:05 2012
@@ -198,7 +198,8 @@
     </section>
     <section xml:id="perf.schema.keys">
       <title>Key and Attribute Lengths</title>
-      <para>See <xref linkend="keysize" />.</para>
+      <para>See <xref linkend="keysize" />.  See also <xref 
linkend="perf.compression.however" /> for 
+      compression caveats.</para>
     </section>
     <section xml:id="schema.regionsize"><title>Table RegionSize</title>
     <para>The regionsize can be set on a per-table basis via 
<code>setFileSize</code> on
@@ -244,6 +245,15 @@
       <title>Compression</title>
       <para>Production systems should use compression with their ColumnFamily 
definitions.  See <xref linkend="compression" /> for more information.
       </para>
+      <section xml:id="perf.compression.however"><title>However...</title>
+         <para>Compression deflates data <emphasis>on disk</emphasis>.  When 
it's in-memory (e.g., in the 
+         MemStore) or on the wire (e.g., transferring between RegionServer and 
Client) it's inflated.
+         So while using ColumnFamily compression is a best practice, but it's 
not going to completely eliminate
+         the impact of over-sized Keys, over-sized ColumnFamily names, or 
over-sized Column names. 
+         </para>
+         <para>See <xref linkend="keysize" /> on for schema design tips, and 
<xref linkend="keyvalue"/> for more information on HBase stores data internally.
+         </para> 
+      </section>
     </section>
   </section>  <!--  perf schema -->
   


Reply via email to