[Hadoop Wiki] Trivial Update of "Hbase/DesignOverview" by EvgenyRyabitskiy

Apache Wiki Thu, 02 Apr 2009 08:18:22 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by EvgenyRyabitskiy:
http://wiki.apache.org/hadoop/Hbase/DesignOverview

------------------------------------------------------------------------------
  '''This page was created on 06.03.09 and now is in progress of 
construction....'''
- 
  = Table of Contents =
  
   * [#intro Introduction]
   * [#datamodel Data Model]
    * [#conceptual Conceptual View]
    * [#physical Physical Storage View]
-  * [#arch Architecture and Implementation]
+    * [#regions Regions(Rowranges)]
+  * [#design Architecture Design]
    * [#master HBaseMaster]
    * [#hregionserv HRegionServer]
    * [#client HBase Client]
@@ -27, +27 @@

  
  Applications store data rows in labeled tables. A data row has a sortable row 
key and an arbitrary number of columns. The table is stored sparsely, so that 
rows in the same table can have widely varying numbers of columns.
  
- HBase is three dimensional sorted map. It maps from Cartesian product of row 
key, column key and a timestamp to cell value:
+ HBase is three dimensional sorted map. It maps from Cartesian product of row 
key, column key and timestamp to cell value:
  
  (row:byte[] x column:byte[] x timestamp:Long) -> byte[]
  
@@ -82, +82 @@

  
  However, if no timestamp is supplied, the most recent value for a particular 
column would be returned and would also be the first one found since timestamps 
are stored in descending order. Thus a request for the values of all columns in 
the row "com.cnn.www" if no timestamp is specified would be: the value of 
''"contents:"'' from time stamp t6, the value of ''"anchor:cnnsi.com"'' from 
time stamp t9, the value of ''"anchor:my.look.ca"'' from time stamp t8 and the 
value of ''"mime:"'' from time stamp t6.
  
- 
- === Row Ranges: Regions ===
+ [[Anchor(regions)]]
+ === Regions (Row Ranges) ===
  
  To an application, a table appears to be a list of tuples sorted by row key 
ascending, column name ascending and timestamp descending.  Physically, tables 
are broken up into row ranges called ''regions''. Each row range contains rows 
from start-key (inclusive) to end-key (exclusive). A set of regions, sorted 
appropriately, forms an entire table. Row range identified by the table name 
and start-key.
  
@@ -92, +92 @@

   * !StoreFiles maintain the sparse index in a separate file
   * HBase extends !StoreFiles so that a bloom filter can be employed to 
enhance negative lookup performance. The hash function employed is one 
developed by Bob Jenkins.
  
- [[Anchor(arch)]]
- = Architecture and Implementation =
+ [[Anchor(design)]]
+ = Architecture Design =
  
  There are three major components of the HBase architecture:
   1. The HMaster (HBase master server)
@@ -105, +105 @@

  [[Anchor(master)]]
  == HMaster ==
  
- There is one master HMaster  per one cluster.
+ here is only one HMaster for a single HBase deployment.
  
  HMaster duties:
  
-  * Assigning regions to H!RegionServers
+  * Cluster initialization
+  * Assigning/unassigning regions to/from H!RegionServers (unassigning is for 
load balance)
   * Monitor the health of each H!RegionServer
   * Changes to the table schema and handling table administrative functions

[Hadoop Wiki] Trivial Update of "Hbase/DesignOverview" by EvgenyRyabitskiy

Reply via email to