Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by JeanDanielCryans:
http://wiki.apache.org/hadoop/Hbase/DataModel

The comment on the change is:
Added the HBase schema

------------------------------------------------------------------------------
  [[Anchor(hbaseschema)]]
  == The HBase Target Schema ==
  
+ A first solution could be :
+ 
+ ||Table||Row Key||Family||Attributs||
+ ||blogtable||TTYYYYMMDDHHmmss||info:||Always contains the column keys 
author,title,under_title. Should be IN-MEMORY and have a 1 version||
+ ||     || ||text:||No column key. 3 versions||
+ ||     || ||comment_title:||Column keys are written like YYYMMDDHHmmss. 
Should be IN-MEMORY and have a 1 version||
+ ||     || ||comment_author:||Same keys. 1 version||
+ ||     || ||comment_text:||Same keys. 1 version||
+ ||usertable||login_name||info:||Always contains the column keys password and 
name. 1 version||
+ 
+ The row key for blogtable is a concatenation of it's type (shortened to 2 
letters) and it's timestamp. This way, the rows will be gathered first by type 
and then by date throughout the cluster. It means more chances of hitting a 
single region to fetch the needed data. Also you can see that the one-to-many 
relationship between BLOGENTRY and COMMENT is handled by putting each 
attributes of the comments as a family in blogentry and by using it a date as a 
column key, all comments are already sorted.
+ 
+ One advantage of this design is that when you show the "front page" of your 
blog, you only have to fetch the family "info:" from blogtable. When you show 
an actual blog entry, you fetch a whole row. Another advantage is that by using 
timestamps in the row key, your scanner will fetch sequential rows if you want 
to show, for example, the entries from the last month.
+ 

Reply via email to