[jira] [Created] (HBASE-4354) track region history

Ming Ma (JIRA) Thu, 08 Sep 2011 11:34:36 -0700

track region history
--------------------

                 Key: HBASE-4354
                 URL: https://issues.apache.org/jira/browse/HBASE-4354
             Project: HBase
          Issue Type: New Feature
          Components: master, metrics, regionserver
            Reporter: Ming Ma
            Assignee: Ming Ma



For debugging and analysis purposes it will be useful to understand regions' 
lifecycle, how it is created ( from which parent region, for example), how it 
is splitted, assigned, etc. Some of these info are in the logs, hbase .META. 
table, zookeeper, metrics. Certain history data is lost; for example, the 
states will be removed from zookeeper /hbase/unassigned once the region is 
assigned; also .META. table has max version of 10 thus only tracks the last 10 
RS assignments of a given region. It will be nice to put it a central place. It 
can provide:

1. How applications use hbase. For example, it might create large number of 
regions in a short period of time and drop the table later.
2. How HBase internally manage regions such as how regions are splitted, 
assigned, turned offline, etc.

Things to track
1. How it is created, parent region in the case of split.
2. Region tranisition process such as region state change, region server change.


One idea is to put such transition history data to zookeeper. One issue is it 
could blow up zookeeper memory if we have large number of regions and the 
cluster runs for a long time. I would like to get your feedback on different 
approaches to address the issue. One assumption is region assignment doesn't 
happen with high frequency and thus the overhead introduced won't have much 
impact on the system performance.


Approach 1:

Zookeeper knows the history of how /hbase/unassigned is modified, if we can get 
zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region 
transition.

Approach 2:

1.      HBase logs extra region transition data to zookeeper. It could be one 
zookeeper node per transaction.
2.      Have a separate thread on the Master to move data from zookeeper and 
append to HDFS. That will keep the zookeeper size in check.
3.      Have some tool or web UI to show the history of a given region by 
looking at zookeeper and HDFS.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4354) track region history

Reply via email to