Perhaps this feature was discussed before. we currently don't have the full 
transition history of all regions { fromState, ToState, the machine that 
initiates it }; the states will be removed from zookeeper /hbase/unassigned 
once the region is assigned; also ".META." table has max version of 10 thus 
only tracks the last 10 RS assignments of a given region.

One idea is to put such transition history data to zookeeper. One issue is it 
could blow up zookeeper memory if we have large number of regions and the 
cluster runs for a long time. I would like to get your feedback on different 
approaches to address the issue. One assumption is region assignment doesn't 
happen with high frequency and thus the overhead introduced won't have much 
impact on the system performance.

Approach 1:

Zookeeper knows the history of how /hbase/unassigned is modified, if we can get 
zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region 
transition.

Approach 2:

1.      HBase logs extra region transition data to zookeeper. It could be one 
zookeeper node per transaction.
2.      Have a separate thread on the Master to move data from zookeeper and 
append to HDFS. That will keep the zookeeper size in check.
3.      Have some tool or web UI to show the history of a given region by 
looking at zookeeper and HDFS.

Ming

Reply via email to