Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by JimKellerman: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture ------------------------------------------------------------------------------ * [#intro Introduction] * [#datamodel Data Model] + * [#conceptual Conceptual View] + * [#physical Physical Storage View] * [#hregion HRegion (Tablet) Server] * [#master HBase Master Server] * [#metadata META Table] @@ -57, +59 @@ can get data by asking for the "most recent value as of a certain time". Or, clients can fetch all available versions at once. + [[Anchor(conceptual)]] + == Conceptual View == + + Conceptually a table may be thought of a collection of rows that + are located by a row key (and optional timestamp) and where any column + may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable Paper]. + + [[Anchor(datamodelexample)]] + ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"'' ||||<:> '''Column''' ''"anchor:"'' ||<:> '''Column''' ''"mime"'' || + ||<^|5> "com.cnn.www" ||<:> t9 || ||<)> "anchor:cnnsi.com" ||<:> "CNN" || || + ||<:> t8 || ||<)> "anchor:my.look.ca" ||<:> "CNN.com" || || + ||<:> t6 ||<:> "<html>..." || || ||<:> "text/html" || + ||<:> t5 ||<:> `"<html>..."` || || || || + ||<:> t3 ||<:> `"<html>..."` || || || || + + [[Anchor(physical)]] + == Physical Storage View == + + Although, at a conceptual level, tables may be viewed as a sparse set + of rows, physically they are stored on a per-column basis. This is an + important consideration for schema and application designers to keep + in mind. + + Pictorially, the table shown in the [#datamodelexample conceptual view] above would be stored as + follows: + + ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"'' || + ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "<html>..." || + ||<:> t5 ||<:> `"<html>..."` || + ||<:> t3 ||<:> `"<html>..."` || + + [[BR]] + + ||<:> '''Row Key''' ||<:> '''Time Stamp''' |||| '''Column''' ''"anchor:"'' || + ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "anchor:cnnsi.com" ||<:> "CNN" || + ||<:> t8 ||<)> "anchor:my.look.ca" ||<:> "CNN.com" || + + [[BR]] + + ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime:"'' || + || "com.cnn.www" ||<:> t6 ||<:> "text/html" || + + [[BR]] + + It is important to note in the diagram above that the empty cells + shown in the conceptual view are not stored. Thus a request for the + value of the ''"contents"'' column at time stamp ''t8'' would return + a null value. Similarly, a request for an ''"anchor"'' value at time + stamp ''t9'' for "my.look.ca" would return a null value. + + However, if no timestamp is supplied, the most recent value for a + particular column would be returned and would also be the first one + found since time stamps are stored in descending order. Consequently + the value returned for ''"contents"'' if no time stamp is supplied is + the value for ''t6'' and the value for an ''"anchor"'' for + "my.look.ca" if no time stamp is supplied is the value for time stamp + ''t8''. + [[Anchor(hregion)]] = HRegion (Tablet) Server =