Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by JimKellerman: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture The comment on the change is: better example of how data is physically stored on disk. ------------------------------------------------------------------------------ === Example === + To show how data is stored on disk, consider the folloing example: + + A program writes rows "row[0-9]", column "anchor:foo"; then writes + rows "row[0-9]"; column "anchor:bar"; and finally writes rows + "row[0-9]" column "anchor:foo". After flushing the memcache and + compacting the store, the contents of the !MapFile would look like: - The current unit test for HBase included in the patch on - [http://issues.apache.org/jira/browse/HADOOP-1045 Hadoop Jira Issue 1045], - first writes rows with row id's of the form "row_[0-9]+" where the row - number goes from 0 to 999. It writes to two column families: - "contents:basic" and "anchor:anchornum-[0-9]+" (again the range of - numbers for the anchornum family goes from 0 to 999). It then writes - rows with row id's of "row_vals_nnn" where nnn is a three digit, - leading zero filled number from 000 to 999. Two column families are - written: "contents:firstcol" and anchor:secondcol". After a - compaction, dumping the - !MapFile which contains the "anchor:" family we see that the keys, - displayed as column-family(row-key)/timestamp are ordered as follows: {{{ + row=row0, column=anchor:bar, timestamp=1174184619081 + row=row0, column=anchor:foo, timestamp=1174184620720 + row=row0, column=anchor:foo, timestamp=1174184617161 + row=row1, column=anchor:bar, timestamp=1174184619081 + row=row1, column=anchor:foo, timestamp=1174184620721 + row=row1, column=anchor:foo, timestamp=1174184617167 + row=row2, column=anchor:bar, timestamp=1174184619081 + row=row2, column=anchor:foo, timestamp=1174184620724 + row=row2, column=anchor:foo, timestamp=1174184617167 + row=row3, column=anchor:bar, timestamp=1174184619081 + row=row3, column=anchor:foo, timestamp=1174184620724 + row=row3, column=anchor:foo, timestamp=1174184617168 + row=row4, column=anchor:bar, timestamp=1174184619081 + row=row4, column=anchor:foo, timestamp=1174184620724 + row=row4, column=anchor:foo, timestamp=1174184617168 + row=row5, column=anchor:bar, timestamp=1174184619082 + row=row5, column=anchor:foo, timestamp=1174184620725 + row=row5, column=anchor:foo, timestamp=1174184617168 + row=row6, column=anchor:bar, timestamp=1174184619082 + row=row6, column=anchor:foo, timestamp=1174184620725 + row=row6, column=anchor:foo, timestamp=1174184617168 + row=row7, column=anchor:bar, timestamp=1174184619082 + row=row7, column=anchor:foo, timestamp=1174184620725 + row=row7, column=anchor:foo, timestamp=1174184617168 + row=row8, column=anchor:bar, timestamp=1174184619082 + row=row8, column=anchor:foo, timestamp=1174184620725 + row=row8, column=anchor:foo, timestamp=1174184617169 + row=row9, column=anchor:bar, timestamp=1174184619083 + row=row9, column=anchor:foo, timestamp=1174184620725 + row=row9, column=anchor:foo, timestamp=1174184617169 - anchor:anchornum-0(row_0)/1174176403717 - anchor:anchornum-1(row_1)/1174176403723 - anchor:anchornum-10(row_10)/1174176403726 - anchor:anchornum-100(row_100)/1174176403769 - anchor:anchornum-101(row_101)/1174176403770 - anchor:anchornum-102(row_102)/1174176403771 - anchor:anchornum-103(row_103)/1174176403771 - anchor:anchornum-104(row_104)/1174176403772 - anchor:anchornum-105(row_105)/1174176403772 - anchor:anchornum-106(row_106)/1174176403773 - anchor:anchornum-107(row_107)/1174176403773 - anchor:anchornum-108(row_108)/1174176403774 - anchor:anchornum-109(row_109)/1174176403774 - anchor:anchornum-11(row_11)/1174176403727 - ... - anchor:anchornum-99(row_99)/1174176403769 - anchor:anchornum-990(row_990)/1174176403966 - anchor:anchornum-991(row_991)/1174176403966 - anchor:anchornum-992(row_992)/1174176403966 - anchor:anchornum-993(row_993)/1174176403966 - anchor:anchornum-994(row_994)/1174176403966 - anchor:anchornum-995(row_995)/1174176403966 - anchor:anchornum-996(row_996)/1174176403966 - anchor:anchornum-997(row_997)/1174176403966 - anchor:anchornum-998(row_998)/1174176403966 - anchor:anchornum-999(row_999)/1174176403966 - anchor:secondcol(row_vals1_000)/1174176435765 - anchor:secondcol(row_vals1_001)/1174176435766 - anchor:secondcol(row_vals1_002)/1174176435767 - anchor:secondcol(row_vals1_003)/1174176435767 - anchor:secondcol(row_vals1_004)/1174176435767 - anchor:secondcol(row_vals1_005)/1174176435767 - anchor:secondcol(row_vals1_006)/1174176435768 - anchor:secondcol(row_vals1_007)/1174176435768 - anchor:secondcol(row_vals1_008)/1174176435769 - anchor:secondcol(row_vals1_009)/1174176435769 - anchor:secondcol(row_vals1_010)/1174176435770 - ... }}} + Note that column "anchor:foo" is stored twice (because the timestamp + differs) and that the most recent timestamp is the first of the two + entries. - If the row keys had had the same format (say row_nnn), dumping the - !MapFile we would see: - - {{{ - anchor:anchornum-0(row_000)/1174176403717 - anchor:secondcol(row_000)/1174176435765 - anchor:anchornum-1(row_001)/1174176403723 - anchor:secondcol(row_001)/1174176435766 - ... - }}} [[Anchor(hregion)]] = HRegion (Tablet) Server =