[Hadoop Wiki] Trivial Update of "Hbase/DesignOverview" by EvgenyRyabitskiy

Apache Wiki Sun, 08 Mar 2009 17:51:27 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by EvgenyRyabitskiy:
http://wiki.apache.org/hadoop/Hbase/DesignOverview

------------------------------------------------------------------------------
    * [#physical Physical Storage View]
   * [#arch Architecture and Implementation]
    * [#master HBaseMaster]
-   * [#hregion HRegionServer]
+   * [#hregionserv HRegionServer]
    * [#client HBase Client]
  
  [[Anchor(intro)]]
@@ -83, +83 @@

  
  To an application, a table appears to be a list of tuples sorted by row key 
ascending, column name ascending and timestamp descending.  Physically, tables 
are broken up into row ranges called ''regions''. Each row range contains rows 
from start-key (inclusive) to end-key (exclusive). A set of regions, sorted 
appropriately, forms an entire table. Row range identified by the table name 
and start-key.
  
- Each column family in a region is managed by an ''Store''. Each ''Store'' may 
have one or more ''!StoreFiles'' (a Hadoop HDFS file type). !StoreFilesare 
immutable once closed. !StoreFilesare stored in the Hadoop HDFS. Other details 
are the same, except:
+ Each column family in a region is managed by an ''Store''. Each ''Store'' may 
have one or more ''!StoreFiles'' (a Hadoop HDFS file type). !StoreFiles are 
immutable once closed. !StoreFiles are stored in the Hadoop HDFS. Other details:
   * !StoreFiles cannot currently be mapped into memory.
-  * !StoreFiles maintain the sparse index in a separate file rather than at 
the end of the file as SSTable does.
+  * !StoreFiles maintain the sparse index in a separate file
   * HBase extends !StoreFiles so that a bloom filter can be employed to 
enhance negative lookup performance. The hash function employed is one 
developed by Bob Jenkins.
  
  [[Anchor(arch)]]
@@ -109, +109 @@

   * Monitor the health of each H!RegionServer
   * Changes to the table schema and handling table administrative functions
  
- === Assigning regions to H!RegionServers ===
+ === Assigning regions to HRegionServers ===
  
  The first region to be assigned is the ''ROOT region'' which locates all the 
META regions to be assigned. Each ''META region'' maps a number of user regions 
which comprise the multiple tables that a particular HBase instance serves. 
Once all the META regions have been assigned, the master will then assign user 
regions to the H!RegionServers, attempting to balance the number of regions 
served by each H!RegionServer.
  
- Location of ''ROOT region'' is stored in !ZooKeeper. 
+ ==== The META Table ====
  
+ The META table stores information about every user region in HBase which 
includes a H!RegionInfo object containing information such as HRegion id, start 
and end keys, a reference to this HRegions' table descriptor, etc. and the 
address of the H!RegionServer that is currently serving the region. The META 
table can grow as the number of user regions grows.
+ 
+ ==== The ROOT Table ====
+ 
+ The ROOT table is confined to a single region and maps all the regions in the 
META table. Like the META table, it contains a H!RegionInfo object for each 
META region and the location of the H!RegionServer that is serving that META 
region.
+ 
+ Each row in the ROOT and META tables is approximately 1KB in size. At the 
default region size of 256MB, this means that the ROOT region can map 2.6 x 
10^5^ META regions, which in turn map a total 6.9 x 10^10^ user regions, 
meaning that approximately 1.8 x 10^19^ (2^64^) bytes of user data.
+ 
+ Every server (master or region) can get ''ROOT region'' location from 
!ZooKeeper. 
+ 
- === Monitor the health of each H!RegionServer ===
+ === Monitor the health of each HRegionServer ===
  
  If HMaster detects a H!RegionServer is no longer reachable, it will split the 
H!RegionServer's write-ahead log so that there is now one write-ahead log for 
each region that the H!RegionServer was serving. After it has accomplished 
this, it will reassign the regions that were being served by the unreachable 
H!RegionServer.
  
- == Changes to the table schema and handling table administrative functions ==
+ === Changes to the table schema and handling table administrative functions 
===
  
  Table schema is set of tables and it's column families. HMaster can add and 
remove column families, turn on/off tables.
  
  If HMaster dies, the cluster will shut down, but it will be changed soon 
after integration with !ZooKeeper. See [:Hbase/ZookeeperIntegration: ZooKeeper 
Integration]
  
- === The META Table ===
  
- The META table stores information about every user region in HBase which 
includes a H!RegionInfo object containing information such as the start and end 
row keys, whether the region is on-line or off-line, etc. and the address of 
the H!RegionServer that is currently serving the region. The META table can 
grow as the number of user regions grows.
- 
- === The ROOT Table ===
- 
- The ROOT table is confined to a single region and maps all the regions in the 
META table. Like the META table, it contains a H!RegionInfo object for each 
META region and the location of the H!RegionServer that is serving that META 
region.
- 
- Each row in the ROOT and META tables is approximately 1KB in size. At the 
default region size of 256MB, this means that the ROOT region can map 2.6 x 
10^5^ META regions, which in turn map a total 6.9 x 10^10^ user regions, 
meaning that approximately 1.8 x 10^19^ (2^64^) bytes of user data.
- 
- Every server (master or region) can get ''ROOT region'' location from 
!ZooKeeper. 
- 
- [[Anchor(hregion)]]
+ [[Anchor(hregionserv)]]
- == H!RegionServer ==
+ == HRegionServer ==
  
  H!RegionServer duties:
  
@@ -146, +145 @@

   * Handling client read and write requests
   * Flushing cache to HDFS
   * Keeping HLog
-  * Region Compactions and Splits
+  * Compactions
+  * Region Splits
+ 
+ === Serving HRigions assigned to HRegionServer ===
+ 
+ Each HRigion is served by only one H!RegionServer. When H!RegionServer starts 
serving HRigion, it reads HLog and all !StoreFiles from HDFS for this HRigion. 
While serving HRigions, H!RegionServer manage persistent storage of all changes 
to HDFS.
  
  === Handling client read and write requests ===
  
- Client communicates with the HMaster to get a list of HRegions to serve and 
to tell the master that it is alive. Region assignments and other instructions 
from the master "piggy back" on the heart beat messages.
+ Client communicates with the HMaster to get a list of HRegions and 
H!RegionServers serving them. Then client sends write/read requests directly to 
H!RegionServers.
  
- === Write Requests ===
+ ==== Write Requests ====
  
- When a write request is received, it is first written to a write-ahead log 
called a ''HLog''. All write requests for every region the region server is 
serving are written to the same log. Once the request has been written to the 
HLog, it is stored in an in-memory cache called the ''Memcache''. There is one 
Memcache for each Store.
+ When a write request is received, it is first written to a write-ahead log 
called a ''HLog''. All write requests for every region the region server is 
serving are written to the same ''HLog''. Once the request has been written to 
the ''HLog'', the result of changes is stored in an in-memory cache called the 
''Memcache''. There is one Memcache for each Store.
  
- === Read Requests ===
+ ==== Read Requests ====
  
- Reads are handled by first checking the Memcache and if the requested data is 
not found, the !MapFiles are searched for results.
+ Reads are handled by first checking the Memcache and if the requested data is 
not found, the !StoreFiles are searched for results.
  
  === Cache Flushes ===
  
- When the Memcache reaches a configurable size, it is flushed to disk, 
creating a new !MapFile and a marker is written to the HLog, so that when it is 
replayed, log entries before the last flush can be skipped. A flush may also be 
triggered to relieve memory pressure on the region server.
+ When the Memcache reaches a configurable size, it is flushed to HDFS, 
creating a new !StoreFile and a marker is written to the HLog, so that when it 
is replayed, log entries before the last flush can be skipped. A flush may also 
be triggered to relieve memory pressure on the region server.
  
- Cache flushes happen concurrently with the region server processing read and 
write requests. Just before the new !MapFile is moved into place, reads and 
writes are suspended until the !MapFile has been added to the list of active 
!MapFiles for the HStore.
+ Cache flushes happen concurrently with the region server processing read and 
write requests. Just before the new !StoreFile is moved into place, reads and 
writes are suspended until the !StoreFile has been added to the list of active 
!StoreFile for the HStore.
+ 
+ === Keeping HLog ===
+ 
+ There is only one ''HLog'' per each H!RegionServer. It is write-ahead log for 
all changes in serving HRegions for this server.
+ 
+ There are 2 processes that restricts ''HLog'' size:
+  * Rolling process: when ''HLog'' file reaches a configurable size, ''HLog'' 
starts to write in new file and closes old one.
+  * Flushing process: when ''HLog'' reaches a configurable size, it is flushed 
to HDFS.
  
  === Compactions ===
  
- When the number of !MapFiles exceeds a configurable threshold, a minor 
compaction is performed which consolidates the most recently written !MapFiles. 
A major compaction is performed periodically which consolidates all the 
!MapFiles into a single !MapFile. The reason for not always performing a major 
compaction is that the oldest !MapFile can be quite large and reading and 
merging it with the latest !MapFiles, which are much smaller, can be very time 
consuming due to the amount of I/O involved in reading merging and writing the 
contents of the largest !MapFile.
+ When the number of !StoreFiles exceeds a configurable threshold, a minor 
compaction is performed which consolidates the most recently written 
!StoreFiles. A major compaction is performed periodically which consolidates 
all the !StoreFiles into a single !StoreFile. The reason for not always 
performing a major compaction is that the oldest !StoreFile can be quite large 
and reading and merging it with the latest !StoreFiles, which are much smaller, 
can be very time consuming due to the amount of I/O involved in reading merging 
and writing the contents of the largest !StoreFile.
  
- Compactions happen concurrently with the region server processing read and 
write requests. Just before the new !MapFile is moved into place, reads and 
writes are suspended until the !MapFile has been added to the list of active 
!MapFiles for the HStore and the !MapFiles that were merged to create the new 
!MapFile have been removed.
+ Compactions happen concurrently with the region server processing read and 
write requests. Just before the new !StoreFile is moved into place, reads and 
writes are suspended until the !StoreFile has been added to the list of active 
!StoreFiles for the HStore and the !StoreFiles that were merged to create the 
new !StoreFile have been removed.
  
  === Region Splits ===

[Hadoop Wiki] Trivial Update of "Hbase/DesignOverview" by EvgenyRyabitskiy

Reply via email to