Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hbase/MasterRewrite" page has been changed by stack.
http://wiki.apache.org/hadoop/Hbase/MasterRewrite?action=diff&rev1=14&rev2=15

--------------------------------------------------

  Current thinking is to keep region lifecycle all up in zookeeper but that 
won't scale.  Postulate 100k regions -- 100TB at 1G regions -- each with two or 
three possible states each with watchers for state change.  My guess is that 
this is too much to put in zk (Mahadev+Patrick say no if data is small).  TODO: 
how to manage transition from zk to .META.?  Also, can't do getClosest up in 
zk, only in .META.
  
  ===== Design =====
- Here is 
[[http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases#case2|Patrick's 
suggestion]].  We already keep a znode per regionserver though its named for 
the regionservers startcode.  On evaporation of the regionserver ephemeral 
node, master would run a reconciliation (or on assumption of master roll, new 
master would check state in zk making sure a regionserver per region) adding 
unassigned regions back to the unassigned pool.
+ Here is 
[[http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases#case2|Patrick's 
suggestion]].  We already keep a znode per regionserver though its named for 
the regionservers startcode -- see the 'rs' directory in 0.20.x zookeepers.  On 
evaporation of the regionserver ephemeral node, master would run a 
reconciliation (or on assumption of master roll, new master would check state 
in zk making sure a regionserver per region, etc.).
  
- All regions would be listed in .META. table always.  Whether they are online, 
splitting or closing, etc., would be up in zk.
+ All regions would be listed in .META. table always.  Whether they are online, 
splitting or closing, etc., would be up in zk.  So, figuring if something is 
unassigned would be case of a .META. table scan.  Anything not managed by zk, 
needs to be added in there (assigned).
+ 
+ ====== zk layout ======
+ Here is some cleanup of 
[[http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases#case2|Patrick's 
suggestion]]
+ 
+ {{{
+ # First, redo the current 'rs' directory slightly:
+ /hbase/regionservers # master watches /regionservers for any child changes
+ /hbase/regionserver/<host:port:startcode> = <status> # As each region server 
becomes available to do work (or track state if up but not avail) it creates an 
ephemeral node; writes state (up/down).
+ # Master watches all /regionserver/<host:port:startcode> and cleans up if RS 
goes away or changes status
+ 
+ # Now, for regions
+ /hbase/regions/<regionserver by host:port:startcode> # Gets created when 
master notices new region server
+ # RS host:port watches this node for any child changes 
+ 
+ /hbase/regions/<regionserver by host:port:startcode>/<regionXYZ> # znode for 
each region assigned to RS host:port.
+ # RS host:port watches this node in case reassigned by master, or region 
changes state 
+ 
+ #
+ /tables/<regionserver by host:port:startcode>/<regionXYZ>/<state>-<seq#> # 
znode created by master
+ # seq ensures order seen by RS
+ # RS deletes old state znodes as it transitions out, oldest entry is the 
current state, always 1 or more znode here -- the current state 
+ }}}
+ 
+ ====== Questions ======
+ 
+ Should the region znode have state?  E.g. no flush, no compaction so we could 
do a backup by copying a region at a time?
  
  <<Anchor(clean)>>
  

Reply via email to