Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hbase/MasterRewrite" page has been changed by stack.
http://wiki.apache.org/hadoop/Hbase/MasterRewrite?action=diff&rev1=11&rev2=12

--------------------------------------------------

    * Distributes out administered close, flush, compact messages
   * Watches ZK for its own lease and for regionservers so knows when to run 
recovery
  
+ After implementation of this design, master will do all of above except 
manage schema and distribute out messages to close, flush, etc.  Any client can 
do the later by manipulating zk (we can add acl checks later).  Remaining 
master tasks will be less prone to error and run snappier because no longer 
based on messaging carried atop periodic heartbeats from regionservers.
+ 
  <<Anchor(problems)>>
  
  == Problems with current Master ==
@@ -44, +46 @@

    1. Each regionserver carries 100 regions of 1G each (100k regions =~ 100TB)
  
  <<Anchor(design)>>
+ == Design ==
  
- == Design ==
  <<Anchor(moveall)>>
+ === Move all state, state transitions, and schema to go via zookeeper ===
+ Currently state transitions are done inside master shuffling between Maps 
triggered by messages carried on the back of regionserver heartbeats.  Move all 
to zookeeper.
  
- === Move all state, state transitions, and schema to go via zookeeper ===
+ <<Anchor(tablestate)>>
+ ==== Table State ====
- Tables are offlined, onlined, made read-only, and dropped (Add freeze of 
flushes and compactions state to facilitate snapshotting).  Currently HBase 
Master does this by messaging regionservers.  Instead move state to zookeeper.  
Let regionservers watch for changes and react.  Allow that a cluster may have 
up to 100 tables.  Tables are made of regions.  There may be thousands of 
regions per table.  A regionserver could be carrying a region from each of the 
100 tables.  TODO: Should regionserver have a table watcher or a watcher per 
region?
+ Tables are offlined, onlined, made read-only, and dropped (Add freeze of 
flushes and compactions state to facilitate snapshotting).  Currently HBase 
Master does this by messaging regionservers.  Instead move state to zookeeper.  
Let regionservers watch for changes and react.  Allow that a cluster may have 
up to 100 tables.  Tables are made of regions.  There may be thousands of 
regions per table.  A regionserver could be carrying a region from each of the 
100 tables.
  
- Tables have schema.  Tables are made of column families.  Column families 
have schema/attributes.  Column families can be added and removed.  Currently 
the schema is written into a column in the .META. catalog family.  Move all 
schema to zookeeper.   Regionservers would have watchers on schema and would 
react to changes.  TODO: A watcher per column family or a watcher per table or 
a watcher on the parent directory for schema?
+ Tables have schema.  Tables are made of column families.  Column families 
have schema/attributes.  Column families can be added and removed.  Currently 
the schema is written into a column in the .META. catalog family.  Move all 
schema to zookeeper.   Regionservers would have watchers on schema and would 
react to changes.
+ 
+ In a tables znode up in zk, have a file that per table on the cluster, it 
lists current state attributes -- read-only, no-flush -- and that tables' 
schema all in JSON.  Only the differences from default are up in zk.  All 
regionservers keep watch on this znode reacting if changed spinning through 
their list of regions making reconciliation with current state of tables znode 
content.
+ 
+ <<Anchor(regionstate)>>
+ ==== Region State ====
  
  Run region state transitions -- i.e. opening, closing -- by changing state in 
zookeeper rather than in Master maps as is currently done.
  
@@ -74, +84 @@

  # Is STARTCODE a timestamp or a random id?
  /hbase/rs/STARTCODE/load/
  /hbase/rs/STARTCODE/regions/opening/
+ /hbase/tables/TABLENAME {JSON array of table objects.  Each table object 
would have state and schema objects, etc.  State is read-only, offline, etc.  
Schema has differences from default only}
- /hbase/tables/TABLENAME/schema/attributes serialized as JSON # These are 
table attributes.  Distinct from state flags such as read-only.
- /hbase/tables/TABLENAME/schema/families/FAMILYNAME/attributes serialized as 
JSON
- /hbase/tables/TABLENAME/state/attribute # Can have only one attribute at a 
time?  E.g. Read-only implies online and no flush/compaction.  Allow support 
for multiple.
  }}}
  
  <<Anchor(clean)>>

Reply via email to