Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by JeanDanielCryans: http://wiki.apache.org/hadoop/Hbase/DataModel New page: * [#intro Introduction] * [#overview Overview] [[Anchor(intro)]] = Introduction = The Bigtable data model and therefor the HBase data model too since it's a clone, is particularly well adapted to data-intensive systems. Getting high scalability from your relational database isn't done by simply adding more machines because its data model is based on a single-machine architecture. For example, a JOIN between two tables is done in memory and does not take into account the possibility that the data has to go over the wire. Companies who did propose relational distributed databases had a lot of redesign to do and this why they have high licensing costs. The other option is to use replication and when the slaves are overloaded with ''writes'', the last option is to begin sharding the tables in sub-databases. At that point, data normalization is a thing you only remember seeing in class which is why going with the data model presented in this paper shouldn't bother you at all. [[Anchor(overview)]] = Overview = To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], Map<byte[], Map<long, byte[]>>>>. The first Map maps row keys to their ''column families''. The second maps column families to their ''column keys''. The third one maps column keys'' to their ''timestamps''. Finally, the last one maps the timestamps to a single value. The keys are typically strings, the timestamp is a long and the value is an uninterpreted array of bytes. The row key+column key+timestamp -> value
