[Hadoop Wiki] Update of "Hbase/DataModel" by JeanDanielCryans

Apache Wiki Fri, 11 Jul 2008 07:40:43 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by JeanDanielCryans:
http://wiki.apache.org/hadoop/Hbase/DataModel

New page:
 * [#intro Introduction]
 * [#overview Overview]

[[Anchor(intro)]]
= Introduction =

The Bigtable data model and therefor the HBase data model too since it's a 
clone, is particularly well adapted to data-intensive systems. Getting high 
scalability from your relational database isn't done by simply adding more 
machines because its data model is based on a single-machine architecture. For 
example, a JOIN between two tables is done in memory and does not take into 
account the possibility that the data has to go over the wire. Companies who 
did propose relational distributed databases had a lot of redesign to do and 
this why they have high licensing costs. The other option is to use replication 
and when the slaves are overloaded with ''writes'', the last option is to begin 
sharding the tables in sub-databases. At that point, data normalization is a 
thing you only remember seeing in class which is why going with the data model 
presented in this paper shouldn't bother  you at all.

[[Anchor(overview)]]
= Overview =

To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], Map<byte[], 
Map<long, byte[]>>>>. The first Map maps row keys to their ''column families''. 
The second maps column families to their ''column keys''. The third one maps 
column keys'' to their ''timestamps''. Finally, the last one maps the 
timestamps to a single value. The keys are typically strings, the timestamp is 
a long and the value is an uninterpreted array of bytes. The 

row key+column key+timestamp -> value

[Hadoop Wiki] Update of "Hbase/DataModel" by JeanDanielCryans

Reply via email to