[Hadoop Wiki] Update of "Hbase/DataModel" by JeanDanielCryans

Apache Wiki Sun, 13 Jul 2008 09:24:34 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by JeanDanielCryans:
http://wiki.apache.org/hadoop/Hbase/DataModel

------------------------------------------------------------------------------
  [[Anchor(overview)]]
  = Overview =
  
- To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], 
Map<byte[], Map<long, byte[]>>>>. The first Map maps row keys to their ''column 
families''. The second maps column families to their ''column keys''. The third 
one maps column keys to their ''timestamps''. Finally, the last one maps the 
timestamps to a single value. The keys are typically strings, the timestamp is 
a long and the value is an uninterpreted array of bytes. The column key is 
always preceded by its family and is represented like this: ''family:key''. 
Since a family maps to another map, this means that a single column family can 
contain a theoretical infinity of column keys. So, to retrieve a single value, 
the user has to do a ''get'' using three keys:
+ To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], 
Map<byte[], Map<Long, byte[]>>>>. The first Map maps row keys to their ''column 
families''. The second maps column families to their ''column keys''. The third 
one maps column keys to their ''timestamps''. Finally, the last one maps the 
timestamps to a single value. The keys are typically strings, the timestamp is 
a long and the value is an uninterpreted array of bytes. The column key is 
always preceded by its family and is represented like this: ''family:key''. 
Since a family maps to another map, this means that a single column family can 
contain a theoretical infinity of column keys. So, to retrieve a single value, 
the user has to do a ''get'' using three keys:
  
  row key+column key+timestamp -> value
  
@@ -54, +54 @@

  The following attributes can be specified or each families:
  
  Implemented
- 
   * Compression
    * Record: means that each exact values found at a 
rowkey+columnkey+timestamp will be compressed independently.
    * Block: means that blocks in HDFS are compressed. A block may contain 
multiple records if they are shorter than one HDFS block or may only contain 
part of a record if the record is longer than a HDFS block.
@@ -63, +62 @@

    * Time to live: versions older than specified time will be garbage 
collected.
  
  Still not implemented
- 
   * In memory: all values of that family will be kept in memory.
-  * Length: values written will not be longer than the specified number of 
bytes.
+  * Length: values written will not be longer than the specified number of 
bytes. See [https://issues.apache.org/jira/browse/HBASE-742 See HBASE-742]
  
  [[Anchor(example)]]
  = Real Life Example =
+ The following example is the same one given during HBase ETS presentation 
available in french in the presentation page.
+ 
+ A good example on how to demonstrate the HBase data model is a blog because 
of it's simple features and domain. Suppose the following mini-SRS:
+  * The blog entries, which consist of a title, an under title, a date, an 
author, a type (or tag), a text, and comments, can be created and updated by 
logged in users.
+  * The users, which consist of a username, a password, and a name, can log in 
and log out.
+  * The comments, which consist of a title, an author, and text, can be 
written anonymously by visitors as long as their identity is verified by a 
captcha.
  
  [[Anchor(relational)]]
  == The Source ERD ==
  
+ http://www.hadoop.ca/img/db_blog.jpg
+ 
  [[Anchor(hbaseschema)]]
  == The HBase Target Schema ==

[Hadoop Wiki] Update of "Hbase/DataModel" by JeanDanielCryans

Reply via email to