Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by EvgenyRyabitskiy: http://wiki.apache.org/hadoop/Hbase/DesignOverview ------------------------------------------------------------------------------ '''This page was created on 06.03.09 and now is in progress of construction....''' - = Table of Contents = * [#intro Introduction] * [#datamodel Data Model] * [#conceptual Conceptual View] * [#physical Physical Storage View] - * [#arch Architecture and Implementation] + * [#regions Regions(Rowranges)] + * [#design Architecture Design] * [#master HBaseMaster] * [#hregionserv HRegionServer] * [#client HBase Client] @@ -27, +27 @@ Applications store data rows in labeled tables. A data row has a sortable row key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have widely varying numbers of columns. - HBase is three dimensional sorted map. It maps from Cartesian product of row key, column key and a timestamp to cell value: + HBase is three dimensional sorted map. It maps from Cartesian product of row key, column key and timestamp to cell value: (row:byte[] x column:byte[] x timestamp:Long) -> byte[] @@ -82, +82 @@ However, if no timestamp is supplied, the most recent value for a particular column would be returned and would also be the first one found since timestamps are stored in descending order. Thus a request for the values of all columns in the row "com.cnn.www" if no timestamp is specified would be: the value of ''"contents:"'' from time stamp t6, the value of ''"anchor:cnnsi.com"'' from time stamp t9, the value of ''"anchor:my.look.ca"'' from time stamp t8 and the value of ''"mime:"'' from time stamp t6. - - === Row Ranges: Regions === + [[Anchor(regions)]] + === Regions (Row Ranges) === To an application, a table appears to be a list of tuples sorted by row key ascending, column name ascending and timestamp descending. Physically, tables are broken up into row ranges called ''regions''. Each row range contains rows from start-key (inclusive) to end-key (exclusive). A set of regions, sorted appropriately, forms an entire table. Row range identified by the table name and start-key. @@ -92, +92 @@ * !StoreFiles maintain the sparse index in a separate file * HBase extends !StoreFiles so that a bloom filter can be employed to enhance negative lookup performance. The hash function employed is one developed by Bob Jenkins. - [[Anchor(arch)]] - = Architecture and Implementation = + [[Anchor(design)]] + = Architecture Design = There are three major components of the HBase architecture: 1. The HMaster (HBase master server) @@ -105, +105 @@ [[Anchor(master)]] == HMaster == - There is one master HMaster per one cluster. + here is only one HMaster for a single HBase deployment. HMaster duties: - * Assigning regions to H!RegionServers + * Cluster initialization + * Assigning/unassigning regions to/from H!RegionServers (unassigning is for load balance) * Monitor the health of each H!RegionServer * Changes to the table schema and handling table administrative functions
