http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/schema_design.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/schema_design.xml 
b/src/main/docbkx/schema_design.xml
deleted file mode 100644
index 765a8f7..0000000
--- a/src/main/docbkx/schema_design.xml
+++ /dev/null
@@ -1,923 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter version="5.0" xml:id="schema"
-         xmlns="http://docbook.org/ns/docbook";
-         xmlns:xlink="http://www.w3.org/1999/xlink";
-         xmlns:xi="http://www.w3.org/2001/XInclude";
-         xmlns:svg="http://www.w3.org/2000/svg";
-         xmlns:m="http://www.w3.org/1998/Math/MathML";
-         xmlns:html="http://www.w3.org/1999/xhtml";
-         xmlns:db="http://docbook.org/ns/docbook";>
-<!--
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <title>HBase and Schema Design</title>
-      <para>A good general introduction on the strength and weaknesses 
modelling on
-          the various non-rdbms datastores is Ian Varley's Master thesis,
-          <link 
xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf";>No
 Relation: The Mixed Blessings of Non-Relational Databases</link>.
-          Recommended.  Also, read <xref linkend="keyvalue"/> for how HBase 
stores data internally, and the section on 
-          <xref linkend="schema.casestudies">HBase Schema Design Case 
Studies</xref>.
-      </para>      
-  <section xml:id="schema.creation">
-  <title>
-      Schema Creation
-  </title>
-  <para>HBase schemas can be created or updated with <xref linkend="shell" />
-      or by using <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html";>HBaseAdmin</link>
 in the Java API.
-      </para>
-      <para>Tables must be disabled when making ColumnFamily modifications, 
for example..
-      <programlisting>
-Configuration config = HBaseConfiguration.create();
-HBaseAdmin admin = new HBaseAdmin(conf);
-String table = "myTable";
-
-admin.disableTable(table);
-
-HColumnDescriptor cf1 = ...;
-admin.addColumn(table, cf1);      // adding new ColumnFamily
-HColumnDescriptor cf2 = ...;
-admin.modifyColumn(table, cf2);    // modifying existing ColumnFamily
-
-admin.enableTable(table);
-      </programlisting>
-      </para>See <xref linkend="client_dependencies"/> for more information 
about configuring client connections.
-      <para>Note:  online schema changes are supported in the 0.92.x codebase, 
but the 0.90.x codebase requires the table
-      to be disabled.
-      </para>
-    <section xml:id="schema.updates"><title>Schema Updates</title>
-      <para>When changes are made to either Tables or ColumnFamilies (e.g., 
region size, block size), these changes
-      take effect the next time there is a major compaction and the StoreFiles 
get re-written.
-      </para>
-      <para>See <xref linkend="store"/> for more information on StoreFiles.
-      </para>
-    </section>
-  </section>
-  <section xml:id="number.of.cfs">
-  <title>
-      On the number of column families
-  </title>
-  <para>
-      HBase currently does not do well with anything above two or three column 
families so keep the number
-      of column families in your schema low.  Currently, flushing and 
compactions are done on a per Region basis so
-      if one column family is carrying the bulk of the data bringing on 
flushes, the adjacent families
-      will also be flushed though the amount of data they carry is small.  
When many column families the
-      flushing and compaction interaction can make for a bunch of needless i/o 
loading (To be addressed by
-      changing flushing and compaction to work on a per column family basis).  
For more information
-      on compactions, see <xref linkend="compaction"/>.
-    </para>
-    <para>Try to make do with one column family if you can in your schemas.  
Only introduce a
-        second and third column family in the case where data access is 
usually column scoped;
-        i.e. you query one column family or the other but usually not both at 
the one time.
-    </para>
-    <section xml:id="number.of.cfs.card"><title>Cardinality of 
ColumnFamilies</title>
-      <para>Where multiple ColumnFamilies exist in a single table, be aware of 
the cardinality (i.e., number of rows).
-      If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion 
rows, ColumnFamilyA's data will likely be spread
-      across many, many regions (and RegionServers).  This makes mass scans 
for ColumnFamilyA less efficient.
-      </para>
-    </section>
-  </section>
-  <section xml:id="rowkey.design"><title>Rowkey Design</title>
-    <section xml:id="timeseries">
-    <title>
-    Monotonically Increasing Row Keys/Timeseries Data
-    </title>
-    <para>
-      In the HBase chapter of Tom White's book <link 
xlink:url="http://oreilly.com/catalog/9780596521981";>Hadoop: The Definitive 
Guide</link> (O'Reilly) there is a an optimization note on watching out for a 
phenomenon where an import process walks in lock-step with all clients in 
concert pounding one of the table's regions (and thus, a single node), then 
moving onto the next region, etc.  With monotonically increasing row-keys 
(i.e., using a timestamp), this will happen.  See this comic by IKai Lan on why 
monotonically increasing row keys are problematic in BigTable-like datastores:
-      <link 
xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/";>monotonically
 increasing values are bad</link>.  The pile-up on a single region brought on
-      by monotonically increasing keys can be mitigated by randomizing the 
input records to not be in sorted order, but in general it's best to avoid 
using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
-    </para>
-    <para>If you do need to upload time series data into HBase, you should
-    study <link xlink:href="http://opentsdb.net/";>OpenTSDB</link> as a
-    successful example.  It has a page describing the <link xlink:href=" 
http://opentsdb.net/schema.html";>schema</link> it uses in
-    HBase.  The key format in OpenTSDB is effectively 
[metric_type][event_timestamp], which would appear at first glance to 
contradict the previous advice about not using a timestamp as the key.  
However, the difference is that the timestamp is not in the 
<emphasis>lead</emphasis> position of the key, and the design assumption is 
that there are dozens or hundreds (or more) of different metric types.  Thus, 
even with a continual stream of input data with a mix of metric types, the Puts 
are distributed across various points of regions in the table.
-   </para>
-   <para>See <xref linkend="schema.casestudies">HBase Schema Design Case 
Studies</xref> for some rowkey design examples.
-   </para>
-  </section>
-  <section xml:id="keysize">
-      <title>Try to minimize row and column sizes</title>
-      <subtitle>Or why are my StoreFile indices large?</subtitle>
-      <para>In HBase, values are always freighted with their coordinates; as a
-          cell value passes through the system, it'll be accompanied by its
-          row, column name, and timestamp - always.  If your rows and column 
names
-          are large, especially compared to the size of the cell value, then
-          you may run up against some interesting scenarios.  One such is
-          the case described by Marc Limotte at the tail of
-          <link 
xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272";>HBASE-3551</link>
-          (recommended!).
-          Therein, the indices that are kept on HBase storefiles (<xref 
linkend="hfile" />)
-                  to facilitate random access may end up occupyng large chunks 
of the HBase
-                  allotted RAM because the cell value coordinates are large.
-                  Mark in the above cited comment suggests upping the block 
size so
-                  entries in the store file index happen at a larger interval 
or
-                  modify the table schema so it makes for smaller rows and 
column
-                  names.
-                  Compression will also make for larger indices.  See
-                  the thread <link 
xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize";>a
 question storefileIndexSize</link>
-                  up on the user mailing list.
-       </para>
-       <para>Most of the time small inefficiencies don't matter all that much. 
 Unfortunately,
-         this is a case where they do.  Whatever patterns are selected for 
ColumnFamilies, attributes, and rowkeys they could be repeated
-       several billion times in your data. </para>
-       <para>See <xref linkend="keyvalue"/> for more information on HBase 
stores data internally to see why this is important.</para>
-       <section xml:id="keysize.cf"><title>Column Families</title>
-         <para>Try to keep the ColumnFamily names as small as possible, 
preferably one character (e.g. "d" for data/default).
-         </para>
-       <para>See <xref linkend="keyvalue"/> for more information on HBase 
stores data internally to see why this is important.</para>
-       </section>
-       <section xml:id="keysize.atttributes"><title>Attributes</title>
-         <para>Although verbose attribute names (e.g., 
"myVeryImportantAttribute") are easier to read, prefer shorter attribute names 
(e.g., "via")
-         to store in HBase.
-         </para>
-       <para>See <xref linkend="keyvalue"/> for more information on HBase 
stores data internally to see why this is important.</para>
-       </section>
-       <section xml:id="keysize.row"><title>Rowkey Length</title>
-         <para>Keep them as short as is reasonable such that they can still be 
useful for required data access (e.g., Get vs. Scan).
-         A short key that is useless for data access is not better than a 
longer key with better get/scan properties.  Expect tradeoffs
-         when designing rowkeys.
-         </para>
-       </section>
-       <section xml:id="keysize.patterns"><title>Byte Patterns</title>
-         <para>A long is 8 bytes.  You can store an unsigned number up to 
18,446,744,073,709,551,615 in those eight bytes.
-            If you stored this number as a String -- presuming a byte per 
character -- you need nearly 3x the bytes.
-         </para>
-         <para>Not convinced?  Below is some sample code that you can run on 
your own.
-<programlisting>
-// long
-//
-long l = 1234567890L;
-byte[] lb = Bytes.toBytes(l);
-System.out.println("long bytes length: " + lb.length);   // returns 8
-
-String s = "" + l;
-byte[] sb = Bytes.toBytes(s);
-System.out.println("long as string length: " + sb.length);    // returns 10
-
-// hash
-//
-MessageDigest md = MessageDigest.getInstance("MD5");
-byte[] digest = md.digest(Bytes.toBytes(s));
-System.out.println("md5 digest bytes length: " + digest.length);    // returns 
16
-
-String sDigest = new String(digest);
-byte[] sbDigest = Bytes.toBytes(sDigest);
-System.out.println("md5 digest as string length: " + sbDigest.length);    // 
returns 26
-</programlisting>
-         </para>
-         <para>Unfortunately, using a binary representation of a type will 
make your data harder to read outside of your code. For example,
-               this is what you will see in the shell when you increment a 
value:
-<programlisting>
-hbase(main):001:0> incr 't', 'r', 'f:q', 1
-COUNTER VALUE = 1
-
-hbase(main):002:0> get 't', 'r'
-COLUMN                                        CELL
- f:q                                          timestamp=1369163040570, 
value=\x00\x00\x00\x00\x00\x00\x00\x01
-1 row(s) in 0.0310 seconds
-</programlisting>
-               The shell makes a best effort to print a string, and it this 
case it decided to just print the hex. The same will
-               happen to your row keys inside the region names. It can be okay 
if you know what's being stored, but it might also
-               be unreadable if arbitrary data can be put in the same cells. 
This is the main trade-off.
-         </para>
-       </section>
-
-    </section>
-    <section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
-    <para>A common problem in database processing is quickly finding the most 
recent version of a value.  A technique using reverse timestamps
-    as a part of the key can help greatly with a special case of this problem. 
 Also found in the HBase chapter of Tom White's book Hadoop:  The Definitive 
Guide (O'Reilly),
-    the technique involves appending (<code>Long.MAX_VALUE - timestamp</code>) 
to the end of any key, e.g., [key][reverse_timestamp].
-    </para>
-    <para>The most recent value for [key] in a table can be found by 
performing a Scan for [key] and obtaining the first record.  Since HBase keys
-    are in sorted order, this key sorts before any older row-keys for [key] 
and thus is first.
-    </para>
-    <para>This technique would be used instead of using <xref 
linkend="schema.versions">HBase Versioning</xref> where the intent is to hold 
onto all versions
-    "forever" (or a very long time) and at the same time quickly obtain access 
to any other version by using the same Scan technique.
-    </para>
-    </section>
-    <section xml:id="rowkey.scope">
-    <title>Rowkeys and ColumnFamilies</title>
-    <para>Rowkeys are scoped to ColumnFamilies.  Thus, the same rowkey could 
exist in each ColumnFamily that exists in a table without collision.
-    </para>
-    </section>
-    <section xml:id="changing.rowkeys"><title>Immutability of Rowkeys</title>
-    <para>Rowkeys cannot be changed.  The only way they can be "changed" in a 
table is if the row is deleted and then re-inserted.
-    This is a fairly common question on the HBase dist-list so it pays to get 
the rowkeys right the first time (and/or before you've
-    inserted a lot of data).
-    </para>
-    </section>
-    <section xml:id="rowkey.regionsplits"><title>Relationship Between RowKeys 
and Region Splits</title>
-    <para>If you pre-split your table, it is <emphasis>critical</emphasis> to 
understand how your rowkey will be distributed across
-    the region boundaries.  As an example of why this is important, consider 
the example of using displayable hex characters as the
-    lead position of the key (e.g., ""0000000000000000" to 
"ffffffffffffffff").  Running those key ranges through <code>Bytes.split</code>
-    (which is the split strategy used when creating regions in 
<code>HBaseAdmin.createTable(byte[] startKey, byte[] endKey, numRegions)</code>
-    for 10 regions will generate the following splits...
-    </para>
-    <para>
-    <programlisting>
-48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48                                
// 0
-54 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10                 
// 6
-61 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -68                 
// =
-68 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -126  
// D
-75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 72                                
// K
-82 18 18 18 18 18 18 18 18 18 18 18 18 18 18 14                                
// R
-88 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -44                 
// X
-95 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -102                
// _
-102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102                
// f
-    </programlisting>
-    ... (note:  the lead byte is listed to the right as a comment.)  Given 
that the first split is a '0' and the last split is an 'f',
-    everything is great, right?  Not so fast.
-    </para>
-    <para>The problem is that all the data is going to pile up in the first 2 
regions and the last region thus creating a "lumpy" (and
-    possibly "hot") region problem.  To understand why, refer to an  <link 
xlink:href="http://www.asciitable.com";>ASCII Table</link>.
-    '0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte 
values (bytes 58 to 96) that will <emphasis>never appear in this
-    keyspace</emphasis> because the only values are [0-9] and [a-f].  Thus, 
the middle regions regions will
-    never be used.  To make pre-spliting work with this example keyspace, a 
custom definition of splits (i.e., and not relying on the
-    built-in split method) is required.
-    </para>
-    <para>Lesson #1:  Pre-splitting tables is generally a best practice, but 
you need to pre-split them in such a way that all the
-    regions are accessible in the keyspace.  While this example demonstrated 
the problem with a hex-key keyspace, the same problem can happen
-     with <emphasis>any</emphasis> keyspace.  Know your data.
-    </para>
-    <para>Lesson #2:  While generally not advisable, using hex-keys (and more 
generally, displayable data) can still work with pre-split
-    tables as long as all the created regions are accessible in the keyspace.
-    </para>
-        <para>To conclude this example, the following is an example of  how 
appropriate splits can be pre-created for hex-keys:.
-           </para>
-<programlisting>public static boolean createTable(HBaseAdmin admin, 
HTableDescriptor table, byte[][] splits)
-throws IOException {
-  try {
-    admin.createTable( table, splits );
-    return true;
-  } catch (TableExistsException e) {
-    logger.info("table " + table.getNameAsString() + " already exists");
-    // the table already exists...
-    return false;
-  }
-}
-
-public static byte[][] getHexSplits(String startKey, String endKey, int 
numRegions) {
-  byte[][] splits = new byte[numRegions-1][];
-  BigInteger lowestKey = new BigInteger(startKey, 16);
-  BigInteger highestKey = new BigInteger(endKey, 16);
-  BigInteger range = highestKey.subtract(lowestKey);
-  BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
-  lowestKey = lowestKey.add(regionIncrement);
-  for(int i=0; i &lt; numRegions-1;i++) {
-    BigInteger key = 
lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
-    byte[] b = String.format("%016x", key).getBytes();
-    splits[i] = b;
-  }
-  return splits;
-}</programlisting>
-    </section>
-    </section>  <!--  rowkey design -->
-    <section xml:id="schema.versions">
-  <title>
-  Number of Versions
-  </title>
-     <section xml:id="schema.versions.max"><title>Maximum Number of 
Versions</title>
-      <para>The maximum number of row versions to store is configured per 
column
-      family via <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>.
-      The default for max versions is 3.
-      This is an important parameter because as described in <xref 
linkend="datamodel" />
-      section HBase does <emphasis>not</emphasis> overwrite row values, but 
rather
-      stores different values per row by time (and qualifier).  Excess 
versions are removed during major
-      compactions.  The number of max versions may need to be increased or 
decreased depending on application needs.
-      </para>
-      <para>It is not recommended setting the number of max versions to an 
exceedingly high level (e.g., hundreds or more) unless those old values are
-      very dear to you because this will greatly increase StoreFile size.
-      </para>
-     </section>
-    <section xml:id="schema.minversions">
-    <title>
-    Minimum Number of Versions
-    </title>
-    <para>Like maximum number of row versions, the minimum number of row 
versions to keep is configured per column
-      family via <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>.
-      The default for min versions is 0, which means the feature is disabled.
-      The minimum number of row versions parameter is used together with the 
time-to-live parameter and can be combined with the
-      number of row versions parameter to allow configurations such as
-      "keep the last T minutes worth of data, at most N versions, 
<emphasis>but keep at least M versions around</emphasis>"
-      (where M is the value for minimum number of row versions, M&lt;N).
-      This parameter should only be set when time-to-live is enabled for a 
column family and must be less than the
-      number of row versions.
-    </para>
-    </section>
-  </section>
-  <section xml:id="supported.datatypes">
-  <title>
-  Supported Datatypes
-  </title>
-  <para>HBase supports a "bytes-in/bytes-out" interface via <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html";>Put</link>
 and
-  <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html";>Result</link>,
 so anything that can be
-  converted to an array of bytes can be stored as a value.  Input could be 
strings, numbers, complex objects, or even images as long as they can rendered 
as bytes.
-  </para>
-  <para>There are practical limits to the size of values (e.g., storing 
10-50MB objects in HBase would probably be too much to ask);
-  search the mailling list for conversations on this topic. All rows in HBase 
conform to the <xref linkend="datamodel">datamodel</xref>, and
-  that includes versioning.  Take that into consideration when making your 
design, as well as block size for the ColumnFamily.
-  </para>
-    <section xml:id="counters">
-      <title>Counters</title>
-      <para>
-      One supported datatype that deserves special mention are "counters" 
(i.e., the ability to do atomic increments of numbers).  See
-      <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29";>Increment</link>
 in HTable.
-      </para>
-      <para>Synchronization on counters are done on the RegionServer, not in 
the client.
-      </para>
-    </section>
-  </section>
-  <section xml:id="schema.joins"><title>Joins</title>
-    <para>If you have multiple tables, don't forget to factor in the potential 
for <xref linkend="joins"/> into the schema design.
-    </para>
-  </section>
-  <section xml:id="ttl">
-  <title>Time To Live (TTL)</title>
-  <para>ColumnFamilies can set a TTL length in seconds, and HBase will 
automatically delete rows once the expiration time is reached.
-  This applies to <emphasis>all</emphasis> versions of a row - even the 
current one.  The TTL time encoded in the HBase for the row is specified in UTC.
-  </para>
-  <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>
 for more information.
-  </para>
-  </section>
-  <section xml:id="cf.keep.deleted">
-  <title>
-  Keeping Deleted Cells
-  </title>
-  <para>ColumnFamilies can optionally keep deleted cells. That means deleted 
cells can still be retrieved with
-  <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
 or
-  <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
 operations,
-  as long these operations have a time range specified that ends before the 
timestamp of any delete that would affect the cells.
-  This allows for point in time queries even in the presence of deletes.
-  </para>
-  <para>
-  Deleted cells are still subject to TTL and there will never be more than 
"maximum number of versions" deleted cells.
-  A new "raw" scan options returns all deleted rows and the delete markers.
-  </para>
-  <para>See <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>
 for more information.
-  </para>
-  </section>
-  <section xml:id="secondary.indexes">
-  <title>
-  Secondary Indexes and Alternate Query Paths
-  </title>
-  <para>This section could also be titled "what if my table rowkey looks like 
<emphasis>this</emphasis> but I also want to query my table like 
<emphasis>that</emphasis>."
-  A common example on the dist-list is where a row-key is of the format 
"user-timestamp" but there are reporting requirements on activity across users 
for certain
-  time ranges.  Thus, selecting by user is easy because it is in the lead 
position of the key, but time is not.
-  </para>
-  <para>There is no single answer on the best way to handle this because it 
depends on...
-   <itemizedlist>
-       <listitem>Number of users</listitem>
-       <listitem>Data size and data arrival rate</listitem>
-       <listitem>Flexibility of reporting requirements (e.g., completely 
ad-hoc date selection vs. pre-configured ranges) </listitem>
-       <listitem>Desired execution speed of query (e.g., 90 seconds may be 
reasonable to some for an ad-hoc report, whereas it may be too long for others) 
</listitem>
-   </itemizedlist>
-   ... and solutions are also influenced by the size of the cluster and how 
much processing power you have to throw at the solution.
-   Common techniques are in sub-sections below.  This is a comprehensive, but 
not exhaustive, list of approaches.
-  </para>
-  <para>It should not be a surprise that secondary indexes require additional 
cluster space and processing.
-  This is precisely what happens in an RDBMS because the act of creating an 
alternate index requires both space and processing cycles to update.  RBDMS 
products
-  are more advanced in this regard to handle alternative index management out 
of the box.  However, HBase scales better at larger data volumes, so this is a 
feature trade-off.
-  </para>
-  <para>Pay attention to <xref linkend="performance"/> when implementing any 
of these approaches.</para>
-  <para>Additionally, see the David Butler response in this dist-list thread 
<link 
xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase";>HBase,
 mail # user - Stargate+hbase</link>
-   </para>
-    <section xml:id="secondary.indexes.filter">
-      <title>
-       Filter Query
-      </title>
-      <para>Depending on the case, it may be appropriate to use <xref 
linkend="client.filter"/>.  In this case, no secondary index is created.
-      However, don't try a full-scan on a large table like this from an 
application (i.e., single-threaded client).
-      </para>
-    </section>
-    <section xml:id="secondary.indexes.periodic">
-      <title>
-       Periodic-Update Secondary Index
-      </title>
-      <para>A secondary index could be created in an other table which is 
periodically updated via a MapReduce job.  The job could be executed intra-day, 
but depending on
-      load-strategy it could still potentially be out of sync with the main 
data table.</para>
-      <para>See <xref linkend="mapreduce.example.readwrite"/> for more 
information.</para>
-    </section>
-    <section xml:id="secondary.indexes.dualwrite">
-      <title>
-       Dual-Write Secondary Index
-      </title>
-      <para>Another strategy is to build the secondary index while publishing 
data to the cluster (e.g., write to data table, write to index table).
-      If this is approach is taken after a data table already exists, then 
bootstrapping will be needed for the secondary index with a MapReduce job (see 
<xref linkend="secondary.indexes.periodic"/>).</para>
-    </section>
-    <section xml:id="secondary.indexes.summary">
-      <title>
-       Summary Tables
-      </title>
-      <para>Where time-ranges are very wide (e.g., year-long report) and where 
the data is voluminous, summary tables are a common approach.
-      These would be generated with MapReduce jobs into another table.</para>
-      <para>See <xref linkend="mapreduce.example.summary"/> for more 
information.</para>
-    </section>
-    <section xml:id="secondary.indexes.coproc">
-      <title>
-       Coprocessor Secondary Index
-      </title>
-      <para>Coprocessors act like RDBMS triggers.  These were added in 0.92.  
For more information, see <xref linkend="coprocessors"/>
-      </para>
-    </section>
-  </section>
-  <section xml:id="constraints"><title>Constraints</title>
-    <para>HBase currently supports 'constraints' in traditional (SQL) database 
parlance. The advised usage for Constraints is in enforcing business rules for 
attributes in the table (eg. make sure values are in the range 1-10).
-    Constraints could also be used to enforce referential integrity, but this 
is strongly discouraged as it will dramatically decrease the write throughput 
of the tables where integrity checking is enabled.
-    Extensive documentation on using Constraints can be found at: <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/constraint";>Constraint</link>
 since version 0.94.
-    </para>
-  </section>
-  <section xml:id="schema.casestudies"><title>Schema Design Case 
Studies</title>
-  <para>The following will describe some typical data ingestion use-cases with 
HBase, and how the rowkey design and construction
-   can be approached.  Note:  this is just an illustration of potential 
approaches, not an exhaustive list. 
-   Know your data, and know your processing requirements.
-  </para>  
-  <para>It is highly recommended that you read the rest of the <xref 
linkend="schema">Schema Design Chapter</xref> first, before reading
-  these case studies.
-  </para>
-  <para>Thee following case studies are described:    
-      <itemizedlist>
-         <listitem>Log Data / Timeseries Data</listitem>
-         <listitem>Log Data / Timeseries on Steroids</listitem>
-         <listitem>Customer/Order</listitem>
-         <listitem>Tall/Wide/Middle Schema Design</listitem>
-         <listitem>List Data</listitem>
-     </itemizedlist> 
-  </para>
-    <section xml:id="schema.casestudies.log-timeseries">
-      <title>Case Study - Log Data and Timeseries Data</title>
-      <para>Assume that the following data elements are being collected.
-        <itemizedlist>
-          <listitem>Hostname</listitem>
-          <listitem>Timestamp</listitem>
-          <listitem>Log event</listitem>
-          <listitem>Value/message</listitem>
-        </itemizedlist>
-        We can store them in an HBase table called LOG_DATA, but what will the 
rowkey be?  
-       From these attributes the rowkey will be some combination of hostname, 
timestamp, and log-event - but what specifically?        
-      </para>    
-      <section xml:id="schema.casestudies.log-timeseries.tslead">
-        <title>Timestamp In The Rowkey Lead Position</title>
-        <para>The rowkey <code>[timestamp][hostname][log-event]</code> suffers 
from the monotonically increasing rowkey problem 
-        described in <xref linkend="timeseries"/>.
-        </para>
-        <para>There is another pattern frequently mentioned in the dist-lists 
about “bucketing” timestamps, by performing a mod operation 
-        on the timestamp.  If time-oriented scans are important, this could be 
a useful approach.  Attention must be paid to the number
-        of buckets, because this will require the same number of scans to 
return results.
-<programlisting>
-long bucket = timestamp % numBuckets;
-</programlisting>
-        … to construct:
-<programlisting>
-[bucket][timestamp][hostname][log-event]
-</programlisting>        
-          As stated above, to select data for a particular timerange, a Scan 
will need to be performed for each bucket.  100 buckets,
-          for example, will provide a wide distribution in the keyspace but it 
will require 100 Scans to obtain data for a single
-          timestamp, so there are trade-offs. 
-        </para>
-      </section>  <!-- ts lead -->
-      <section xml:id="schema.casestudies.log-timeseries.hostlead">
-        <title>Host In The Rowkey Lead Position</title>
-        <para>The rowkey <code>[hostname][log-event][timestamp]</code> is a 
candidate if there is a large-ish number of hosts to spread
-        the writes and reads across the keyspace.  This approach would be 
useful if scanning by hostname was a priority.
-        </para>
-      </section> <!--  host lead -->
-      <section xml:id="schema.casestudies.log-timeseries.revts">
-        <title>Timestamp, or Reverse Timestamp?</title>
-        <para>If the most important access path is to pull most recent events, 
then storing the timestamps as reverse-timestamps 
-        (e.g., <code>timestamp = Long.MAX_VALUE – timestamp</code>) will 
create the property of being able to do a Scan on
-        <code>[hostname][log-event]</code> to obtain the quickly obtain the 
most recently captured events.
-        </para>
-        <para>Neither approach is wrong, it just depends on what is most 
appropriate for the situation.
-        </para>
-      </section>  <!--  revts -->
-      <section xml:id="schema.casestudies.log-timeseries.varkeys">
-        <title>Variangle Length or Fixed Length Rowkeys?</title>
-        <para>It is critical to remember that rowkeys are stamped on every 
column in HBase.  If the hostname is “a” and the event type
-         is “e1” then the resulting rowkey would be quite small.  However, 
what if the ingested hostname is
-          “myserver1.mycompany.com” and the event type is 
“com.package1.subpackage2.subsubpackage3.ImportantService”?  
-         </para>
-         <para>It might make sense to use some substitution in the rowkey.  
There are at least two approaches:  hashed and numeric.
-         In the Hostname In The Rowkey Lead Position example, it might look 
like this:
-        </para>
-        <para>Composite Rowkey With Hashes:  
-           <itemizedlist>
-             <listitem>[MD5 hash of hostname] = 16 bytes</listitem>
-             <listitem>[MD5 hash of event-type] = 16 bytes</listitem>
-             <listitem>[timestamp] = 8 bytes</listitem>
-           </itemizedlist>
-        </para>
-        <para>Composite Rowkey With Numeric Substitution: 
-        </para>
-        <para>For this approach another lookup table would be needed in 
addition to LOG_DATA, called LOG_TYPES.  
-        The rowkey of LOG_TYPES would be:
-                 <itemizedlist>
-             <listitem>[type]  (e.g., byte indicating hostname vs. 
event-type)</listitem>
-             <listitem>[bytes]  variable length bytes for raw hostname or 
event-type.</listitem>
-                 </itemizedlist>
-        A column for this rowkey could be a long with an assigned number, 
which could be obtained by using an 
-               <link 
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29";>HBase
 counter</link>.
-        </para>
-        <para>So the resulting composite rowkey would be:
-               <itemizedlist>
-                 <listitem>[substituted long for hostname] = 8 bytes</listitem>
-                 <listitem>[substituted long for event type] = 8 
bytes</listitem>
-                 <listitem>[timestamp] = 8 bytes</listitem>
-               </itemizedlist>
-               In either the Hash or Numeric substitution approach, the raw 
values for hostname and event-type can be stored as columns.
-        </para>      
-      </section>  <!--  varkeys -->
-    </section>  <!--  log data and timeseries -->
-    <section xml:id="schema.casestudies.log-steroids">
-      <title>Case Study - Log Data and Timeseries Data on Steroids</title>
-      <para>This effectively is the OpenTSDB approach.  What OpenTSDB does is 
re-write data and pack rows into columns for 
-        certain time-periods.  For a detailed explanation, see:  <link 
xlink:href="http://opentsdb.net/schema.html";>http://opentsdb.net/schema.html</link>,
 
-        and <link 
xlink:href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html";>Lessons
 Learned from OpenTSDB</link>
-           from HBaseCon2012.
-      </para>
-      <para>But this is how the general concept works:  data is ingested, for 
example, in this manner…
-<programlisting>
-[hostname][log-event][timestamp1]
-[hostname][log-event][timestamp2]
-[hostname][log-event][timestamp3]
-</programlisting>
-       … with separate rowkeys for each detailed event, but is re-written 
like this… 
-       </para>
-       <para><code>[hostname][log-event][timerange]</code>
-       </para>
-       <para>… and each of the above events are converted into columns 
stored with a time-offset relative to the beginning timerange 
-       (e.g., every 5 minutes).  This is obviously a very advanced processing 
technique, but HBase makes this possible.
-      </para>
-    </section>  <!--  log data timeseries steroids -->
-    
-    <section xml:id="schema.casestudies.custorder">
-      <title>Case Study - Customer/Order</title>
-      <para>Assume that HBase is used to store customer and order information. 
 There are two core record-types being ingested:  
-        a Customer record type, and Order record type.
-      </para>
-      <para>The Customer record type would include all the things that you’d 
typically expect:
-        <itemizedlist>
-          <listitem>Customer number</listitem>
-          <listitem>Customer name</listitem>
-          <listitem>Address (e.g., city, state, zip)</listitem>
-          <listitem>Phone numbers, etc.</listitem>
-        </itemizedlist>
-     </para>
-     <para>The Order record type would include things like:
-        <itemizedlist>
-          <listitem>Customer number</listitem>
-          <listitem>Order number</listitem>
-          <listitem>Sales date</listitem>
-          <listitem>A series of nested objects for shipping locations and 
line-items (see <xref linkend="schema.casestudies.custorder.obj"/>
-           for details)</listitem>
-        </itemizedlist>
-    </para>
-    <para>Assuming that the combination of customer number and sales order 
uniquely identify an order, these two attributes will compose
- the rowkey, and specifically a composite key such as:
-    </para>
-    <para><code>[customer number][order number]</code>
-    </para>
-    <para>… for a ORDER table.  However, there are more design decisions to 
make:  are the <emphasis>raw</emphasis> values the best choices for rowkeys?
-    </para>
-    <para>The same design questions in the Log Data use-case confront us here. 
 What is the keyspace of the customer number, and what is the 
-format (e.g., numeric?  alphanumeric?) As it is advantageous to use 
fixed-length keys in HBase, as well as keys that can support a 
-reasonable spread in the keyspace, similar options appear:
-    </para>
-    <para>Composite Rowkey With Hashes:  
-      <itemizedlist>
-        <listitem>[MD5 of customer number] = 16 bytes</listitem>
-        <listitem>[MD5 of order number] = 16 bytes</listitem>
-      </itemizedlist>
-    </para>
-    <para>Composite Numeric/Hash Combo Rowkey: 
-      <itemizedlist>
-        <listitem>[substituted long for customer number] = 8 bytes</listitem>
-        <listitem>[MD5 of order number] = 16 bytes</listitem>
-      </itemizedlist>
-     </para>
-        <section xml:id="schema.casestudies.custorder.tables">
-          <title>Single Table?  Multiple Tables?</title>
-            <para>A traditional design approach would have separate tables for 
CUSTOMER and SALES.  Another option is to pack multiple 
-            record types into a single table (e.g., CUSTOMER++).            
-            </para>
-            <para>Customer Record Type Rowkey:
-              <itemizedlist>
-                <listitem>[customer-id]</listitem>
-                <listitem>[type] = type indicating ‘1’ for customer record 
type</listitem>
-              </itemizedlist>
-            </para>
-            <para>Order Record Type Rowkey:
-              <itemizedlist>
-                <listitem>[customer-id]</listitem>
-                <listitem>[type] = type indicating ‘2’ for order record 
type</listitem>
-                <listitem>[order]</listitem>
-              </itemizedlist>
-            </para>
-            <para>The advantage of this particular CUSTOMER++ approach is that 
organizes many different record-types by customer-id 
-            (e.g., a single scan could get you everything about that 
customer).  The disadvantage is that it’s not as easy to scan for
-            a particular record-type.
-            </para>
-        </section>
-        <section xml:id="schema.casestudies.custorder.obj">
-             <title>Order Object Design</title>
-             <para>Now we need to address how to model the Order object.  
Assume that the class structure is as follows:
-<programlisting>
-<filename>Order</filename>
-     <filename>ShippingLocation</filename>     (an Order can have multiple 
ShippingLocations)
-          <filename>LineItem</filename>               (a ShippingLocation can 
have multiple LineItems)
-</programlisting>
-              ... there are multiple options on storing this data.
-             </para>
-             <section xml:id="schema.casestudies.custorder.obj.norm">
-               <title>Completely Normalized</title>
-               <para>With this approach, there would be separate tables for 
ORDER, SHIPPING_LOCATION, and LINE_ITEM.          
-               </para>
-               <para>The ORDER table's rowkey was described above: <xref 
linkend="schema.casestudies.custorder"/>
-               </para>
-               <para>The SHIPPING_LOCATION's composite rowkey would be 
something like this:
-                 <itemizedlist>
-                   <listitem>[order-rowkey]</listitem>
-                   <listitem>[shipping location number] (e.g., 1st location, 
2nd, etc.)</listitem>
-                 </itemizedlist>
-               </para>
-               <para>The LINE_ITEM table's composite rowkey would be something 
like this:
-                 <itemizedlist>
-                   <listitem>[order-rowkey]</listitem>
-                   <listitem>[shipping location number] (e.g., 1st location, 
2nd, etc.)</listitem>
-                   <listitem>[line item number] (e.g., 1st lineitem, 2nd, 
etc.)</listitem>
-                 </itemizedlist>
-               </para>
-               <para>Such a normalized model is likely to be the approach with 
an RDBMS, but that's not your only option with HBase.
-               The cons of such an approach is that to retrieve information 
about any Order, you will need:
-                 <itemizedlist>
-                   <listitem>Get on the ORDER table for the Order</listitem>
-                   <listitem>Scan on the SHIPPING_LOCATION table for that 
order to get the ShippingLocation instances</listitem>
-                   <listitem>Scan on the LINE_ITEM for each 
ShippingLocation</listitem>
-                 </itemizedlist>
-                 ... granted, this is what an RDBMS would do under the covers 
anyway, but since there are no joins in HBase
-                 you're just more aware of this fact.
-               </para>
-             </section>
-             <section xml:id="schema.casestudies.custorder.obj.rectype">
-               <title>Single Table With Record Types</title>
-               <para>With this approach, there would exist a single table 
ORDER that would contain 
-               </para>
-               <para>The Order rowkey was described above: <xref 
linkend="schema.casestudies.custorder"/>
-                 <itemizedlist>
-                   <listitem>[order-rowkey]</listitem>
-                   <listitem>[ORDER record type]</listitem>
-                 </itemizedlist>
-               </para>
-               <para>The ShippingLocation composite rowkey would be something 
like this:
-                 <itemizedlist>
-                   <listitem>[order-rowkey]</listitem>
-                   <listitem>[SHIPPING record type]</listitem>
-                   <listitem>[shipping location number] (e.g., 1st location, 
2nd, etc.)</listitem>
-                 </itemizedlist>
-               </para>
-               <para>The LineItem composite rowkey would be something like 
this:
-                 <itemizedlist>
-                   <listitem>[order-rowkey]</listitem>
-                   <listitem>[LINE record type]</listitem>
-                   <listitem>[shipping location number] (e.g., 1st location, 
2nd, etc.)</listitem>
-                   <listitem>[line item number] (e.g., 1st lineitem, 2nd, 
etc.)</listitem>
-                 </itemizedlist>
-               </para>
-             </section>
-             <section xml:id="schema.casestudies.custorder.obj.denorm">
-               <title>Denormalized</title>
-               <para>A variant of the Single Table With Record Types approach 
is to denormalize and flatten some of the object 
-               hierarchy, such as collapsing the ShippingLocation attributes 
onto each LineItem instance.
-               </para>
-               <para>The LineItem composite rowkey would be something like 
this:
-                 <itemizedlist>
-                   <listitem>[order-rowkey]</listitem>
-                   <listitem>[LINE record type]</listitem>
-                   <listitem>[line item number] (e.g., 1st lineitem, 2nd, etc. 
- care must be taken that there are unique across the entire order)</listitem>
-                 </itemizedlist>
-               </para>
-               <para>... and the LineItem columns would be something like this:
-                 <itemizedlist>
-                   <listitem>itemNumber</listitem>
-                   <listitem>quantity</listitem>
-                   <listitem>price</listitem>
-                   <listitem>shipToLine1 (denormalized from 
ShippingLocation)</listitem>
-                   <listitem>shipToLine2 (denormalized from 
ShippingLocation)</listitem>
-                   <listitem>shipToCity (denormalized from 
ShippingLocation)</listitem>
-                   <listitem>shipToState (denormalized from 
ShippingLocation)</listitem>
-                   <listitem>shipToZip (denormalized from 
ShippingLocation)</listitem>
-                 </itemizedlist>
-               </para>
-               <para>The pros of this approach include a less complex object 
heirarchy, but one of the cons is that updating gets more 
-               complicated in case any of this information changes.
-               </para>
-             </section>
-             <section xml:id="schema.casestudies.custorder.obj.singleobj">
-               <title>Object BLOB</title>
-               <para>With this approach, the entire Order object graph is 
treated, in one way or another, as a BLOB.  For example, the 
-               ORDER table's rowkey was described above: <xref 
linkend="schema.casestudies.custorder"/>, and a 
-               single column called "order" would contain an object that could 
be deserialized that contained a container Order, 
-               ShippingLocations, and LineItems.
-               </para>
-               <para>There are many options here:  JSON, XML, Java 
Serialization, Avro, Hadoop Writables, etc.  All of them are variants
-               of the same approach:  encode the object graph to a byte-array. 
 Care should be taken with this approach to ensure backward 
-               compatibilty in case the object model changes such that older 
persisted structures can still be read back out of HBase.
-               </para>
-               <para>Pros are being able to manage complex object graphs with 
minimal I/O (e.g., a single HBase Get per
-               Order in this example), but the cons include the aforementioned 
warning about backward compatiblity of serialization,
-               language dependencies of serialization (e.g., Java 
Serialization only works with Java clients), the fact that
-               you have to deserialize the entire object to get any piece of 
information inside the BLOB, and the difficulty in 
-               getting frameworks like Hive to work with custom objects like 
this.
-               </para>
-             </section>
-           </section>  <!--  cust/order order object -->
-    </section>  <!--  cust/order -->   
-      
-       <section xml:id="schema.smackdown"><title>Case Study - 
"Tall/Wide/Middle" Schema Design Smackdown</title>
-         <para>This section will describe additional schema design questions 
that appear on the dist-list, specifically about
-         tall and wide tables.  These are general guidelines and not laws - 
each application must consider its own needs.
-         </para>
-         <section xml:id="schema.smackdown.rowsversions"><title>Rows vs. 
Versions</title>
-           <para>A common question is whether one should prefer rows or 
HBase's built-in-versioning.  The context is typically where there are
-           "a lot" of versions of a row to be retained (e.g., where it is 
significantly above the HBase default of 3 max versions).  The
-           rows-approach would require storing a timstamp in some portion of 
the rowkey so that they would not overwite with each successive update.
-           </para>
-           <para>Preference:  Rows (generally speaking).
-           </para>
-         </section>
-         <section xml:id="schema.smackdown.rowscols"><title>Rows vs. 
Columns</title>
-           <para>Another common question is whether one should prefer rows or 
columns.  The context is typically in extreme cases of wide
-           tables, such as having 1 row with 1 million attributes, or 1 
million rows with 1 columns apiece.
-           </para>
-           <para>Preference:  Rows (generally speaking).  To be clear, this 
guideline is in the context is in extremely wide cases, not in the
-           standard use-case where one needs to store a few dozen or hundred 
columns.  But there is also a middle path between these two
-           options, and that is "Rows as Columns."
-           </para>
-         </section>
-         <section xml:id="schema.smackdown.rowsascols"><title>Rows as 
Columns</title>
-           <para>The middle path between Rows vs. Columns is packing data that 
would be a separate row into columns, for certain rows.
-           OpenTSDB is the best example of this case where a single row 
represents a defined time-range, and then discrete events are treated as
-           columns.  This approach is often more complex, and may require the 
additional complexity of re-writing your data, but has the
-           advantage of being I/O efficient.  For an overview of this 
approach, see
-           <xref linkend="schema.casestudies.log-timeseries.log-steroids"/>.
-           </para>
-         </section>
-       </section>  
-           <!--  note:  the following id is not consistent with the others 
becaus it was formerly in the Case Studies chapter,
-           but I didn't want to break backward compatibility of the link.  But 
future entries should look like the above case-study
-           links (schema.casestudies. ...)  -->
-       <section xml:id="casestudies.schema.listdata">
-               <title>Case Study - List Data</title>
-               <para>The following is an exchange from the user dist-list 
regarding a fairly common question:  
-               how to handle per-user list data in Apache HBase. 
-               </para>
-               <para>*** QUESTION ***</para>
-               <para>
-               We're looking at how to store a large amount of (per-user) list 
data in
-HBase, and we were trying to figure out what kind of access pattern made
-the most sense.  One option is store the majority of the data in a key, so
-we could have something like:
-               </para>
-
-               <programlisting>
-&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId1&gt;:"" (no value)
-&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId2&gt;:"" (no value)
-&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId3&gt;:"" (no value)
-                       </programlisting>
-
-The other option we had was to do this entirely using:
-               <programlisting>
-&lt;FixedWidthUserName&gt;&lt;FixedWidthPageNum0&gt;:&lt;FixedWidthLength&gt;&lt;FixedIdNextPageNum&gt;&lt;ValueId1&gt;&lt;ValueId2&gt;&lt;ValueId3&gt;...
-&lt;FixedWidthUserName&gt;&lt;FixedWidthPageNum1&gt;:&lt;FixedWidthLength&gt;&lt;FixedIdNextPageNum&gt;&lt;ValueId1&gt;&lt;ValueId2&gt;&lt;ValueId3&gt;...
-               </programlisting>
-                       <para>
-where each row would contain multiple values.
-So in one case reading the first thirty values would be:
-                       </para>
-               <programlisting>
-scan { STARTROW =&gt; 'FixedWidthUsername' LIMIT =&gt; 30}
-               </programlisting>
-And in the second case it would be
-               <programlisting>
-get 'FixedWidthUserName\x00\x00\x00\x00'
-               </programlisting>
-                       <para>
-The general usage pattern would be to read only the first 30 values of
-these lists, with infrequent access reading deeper into the lists.  Some
-users would have &lt;= 30 total values in these lists, and some users would
-have millions (i.e. power-law distribution)
-                       </para>                 
-                       <para>
- The single-value format seems like it would take up more space on HBase,
-but would offer some improved retrieval / pagination flexibility.  Would
-there be any significant performance advantages to be able to paginate via
-gets vs paginating with scans?
-                       </para>
-                       <para>
-  My initial understanding was that doing a scan should be faster if our
-paging size is unknown (and caching is set appropriately), but that gets
-should be faster if we'll always need the same page size.  I've ended up
-hearing different people tell me opposite things about performance.  I
-assume the page sizes would be relatively consistent, so for most use cases
-we could guarantee that we only wanted one page of data in the
-fixed-page-length case.  I would also assume that we would have infrequent
-updates, but may have inserts into the middle of these lists (meaning we'd
-need to update all subsequent rows).
-                       </para>
-                       <para>
-Thanks for help / suggestions / follow-up questions.
-                       </para>
-                       <para>*** ANSWER ***</para>
-                       <para>
-If I understand you correctly, you're ultimately trying to store
-triples in the form "user, valueid, value", right? E.g., something
-like:
-                       </para>
-                       <programlisting>
-"user123, firstname, Paul",
-"user234, lastname, Smith"
-                       </programlisting>
-                       <para>
-(But the usernames are fixed width, and the valueids are fixed width).
-                       </para>
-                       <para>
-And, your access pattern is along the lines of: "for user X, list the
-next 30 values, starting with valueid Y". Is that right? And these
-values should be returned sorted by valueid?
-                       </para>
-                       <para>
-The tl;dr version is that you should probably go with one row per
-user+value, and not build a complicated intra-row pagination scheme on
-your own unless you're really sure it is needed.
-                       </para>
-                       <para>
-Your two options mirror a common question people have when designing
-HBase schemas: should I go "tall" or "wide"? Your first schema is
-"tall": each row represents one value for one user, and so there are
-many rows in the table for each user; the row key is user + valueid,
-and there would be (presumably) a single column qualifier that means
-"the value". This is great if you want to scan over rows in sorted
-order by row key (thus my question above, about whether these ids are
-sorted correctly). You can start a scan at any user+valueid, read the
-next 30, and be done. What you're giving up is the ability to have
-transactional guarantees around all the rows for one user, but it
-doesn't sound like you need that. Doing it this way is generally
-recommended (see
-here <link 
xlink:href="http://hbase.apache.org/book.html#schema.smackdown";>http://hbase.apache.org/book.html#schema.smackdown</link>).
-                       </para>
-                       <para>
-Your second option is "wide": you store a bunch of values in one row,
-using different qualifiers (where the qualifier is the valueid). The
-simple way to do that would be to just store ALL values for one user
-in a single row. I'm guessing you jumped to the "paginated" version
-because you're assuming that storing millions of columns in a single
-row would be bad for performance, which may or may not be true; as
-long as you're not trying to do too much in a single request, or do
-things like scanning over and returning all of the cells in the row,
-it shouldn't be fundamentally worse. The client has methods that allow
-you to get specific slices of columns.
-                       </para>
-                       <para>
-Note that neither case fundamentally uses more disk space than the
-other; you're just "shifting" part of the identifying information for
-a value either to the left (into the row key, in option one) or to the
-right (into the column qualifiers in option 2). Under the covers,
-every key/value still stores the whole row key, and column family
-name. (If this is a bit confusing, take an hour and watch Lars
-George's excellent video about understanding HBase schema design:
-<link 
xlink:href="http://www.youtube.com/watch?v=_HLoH_PgrLk)">http://www.youtube.com/watch?v=_HLoH_PgrLk)</link>.
-                       </para>
-                       <para>
-A manually paginated version has lots more complexities, as you note,
-like having to keep track of how many things are in each page,
-re-shuffling if new values are inserted, etc. That seems significantly
-more complex. It might have some slight speed advantages (or
-disadvantages!) at extremely high throughput, and the only way to
-really know that would be to try it out. If you don't have time to
-build it both ways and compare, my advice would be to start with the
-simplest option (one row per user+value). Start simple and iterate! :)
-                       </para>
-               </section>  <!--  listdata -->
-
-  </section> <!--  schema design cases -->
-  <section xml:id="schema.ops"><title>Operational and Performance 
Configuration Options</title>
-    <para>See the Performance section <xref linkend="perf.schema"/> for more 
information operational and performance
-    schema design options, such as Bloom Filters, Table-configured 
regionsizes, compression, and blocksizes.
-    </para>
-  </section>
-
-  </chapter>   <!--  schema design -->

Reply via email to