Author: dmeil
Date: Wed Feb 15 19:08:05 2012
New Revision: 1244649
URL: http://svn.apache.org/viewvc?rev=1244649&view=rev
Log:
hbase-5404. book.xml, performance.xml - more info on compression and schema
design
Modified:
hbase/trunk/src/docbkx/book.xml
hbase/trunk/src/docbkx/performance.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL:
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1244649&r1=1244648&r2=1244649&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Wed Feb 15 19:08:05 2012
@@ -648,15 +648,17 @@ admin.enableTable(table);
<para>Most of the time small inefficiencies don't matter all that much.
Unfortunately,
this is a case where they do. Whatever patterns are selected for
ColumnFamilies, attributes, and rowkeys they could be repeated
several billion times in your data. </para>
- <para>See <xref linkend="keyvalue"/> for more information on HBase
stores data internally.</para>
+ <para>See <xref linkend="keyvalue"/> for more information on HBase
stores data internally to see why this is important.</para>
<section xml:id="keysize.cf"><title>Column Families</title>
<para>Try to keep the ColumnFamily names as small as possible,
preferably one character (e.g. "d" for data/default).
</para>
+ <para>See <xref linkend="keyvalue"/> for more information on HBase
stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.atttributes"><title>Attributes</title>
<para>Although verbose attribute names (e.g.,
"myVeryImportantAttribute") are easier to read, prefer shorter attribute names
(e.g., "via")
to store in HBase.
</para>
+ <para>See <xref linkend="keyvalue"/> for more information on HBase
stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.row"><title>Rowkey Length</title>
<para>Keep them as short as is reasonable such that they can still be
useful for required data access (e.g., Get vs. Scan).
@@ -692,6 +694,7 @@ System.out.println("md5 digest as string
</programlisting>
</para>
</section>
+
</section>
<section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
<para>A common problem in database processing is quickly finding the most
recent version of a value. A technique using reverse timestamps
@@ -888,7 +891,7 @@ System.out.println("md5 digest as string
</section>
<section xml:id="schema.ops"><title>Operational and Performance
Configuration Options</title>
<para>See the Performance section <xref linkend="perf.schema"/> for more
information operational and performance
- schema design options, such as Bloom Filters, Table-configured
regionsizes, and blocksizes.
+ schema design options, such as Bloom Filters, Table-configured
regionsizes, compression, and blocksizes.
</para>
</section>
Modified: hbase/trunk/src/docbkx/performance.xml
URL:
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1244649&r1=1244648&r2=1244649&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Wed Feb 15 19:08:05 2012
@@ -198,7 +198,8 @@
</section>
<section xml:id="perf.schema.keys">
<title>Key and Attribute Lengths</title>
- <para>See <xref linkend="keysize" />.</para>
+ <para>See <xref linkend="keysize" />. See also <xref
linkend="perf.compression.however" /> for
+ compression caveats.</para>
</section>
<section xml:id="schema.regionsize"><title>Table RegionSize</title>
<para>The regionsize can be set on a per-table basis via
<code>setFileSize</code> on
@@ -244,6 +245,15 @@
<title>Compression</title>
<para>Production systems should use compression with their ColumnFamily
definitions. See <xref linkend="compression" /> for more information.
</para>
+ <section xml:id="perf.compression.however"><title>However...</title>
+ <para>Compression deflates data <emphasis>on disk</emphasis>. When
it's in-memory (e.g., in the
+ MemStore) or on the wire (e.g., transferring between RegionServer and
Client) it's inflated.
+ So while using ColumnFamily compression is a best practice, but it's
not going to completely eliminate
+ the impact of over-sized Keys, over-sized ColumnFamily names, or
over-sized Column names.
+ </para>
+ <para>See <xref linkend="keysize" /> on for schema design tips, and
<xref linkend="keyvalue"/> for more information on HBase stores data internally.
+ </para>
+ </section>
</section>
</section> <!-- perf schema -->