[ 
https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101659#comment-13101659
 ] 

Ming Ma commented on HBASE-4354:
--------------------------------

Thanks, Todd. Yes, interface is good to abstract various implementations.

I was about to open a separate jira "dynamic metrics logging" for a more 
general strutured data logging infracture, something useful to collect 
hbase/mapreduce/hdfs dynamic metrics which aren't predefined and could change 
over time. It seems like "region transaction history" could an application for 
that system.

> track region history
> --------------------
>
>                 Key: HBASE-4354
>                 URL: https://issues.apache.org/jira/browse/HBASE-4354
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, metrics, regionserver
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> For debugging and analysis purposes it will be useful to understand regions' 
> lifecycle, how it is created ( from which parent region, for example), how it 
> is splitted, assigned, etc. Some of these info are in the logs, hbase .META. 
> table, zookeeper, metrics. Certain history data is lost; for example, the 
> states will be removed from zookeeper /hbase/unassigned once the region is 
> assigned; also .META. table has max version of 10 thus only tracks the last 
> 10 RS assignments of a given region. It will be nice to put it a central 
> place. It can provide:
> 1. How applications use hbase. For example, it might create large number of 
> regions in a short period of time and drop the table later.
> 2. How HBase internally manage regions such as how regions are splitted, 
> assigned, turned offline, etc.
> Things to track
> 1. How it is created, parent region in the case of split.
> 2. Region tranisition process such as region state change, region server 
> change.
> One idea is to put such transition history data to zookeeper. One issue is it 
> could blow up zookeeper memory if we have large number of regions and the 
> cluster runs for a long time. I would like to get your feedback on different 
> approaches to address the issue. One assumption is region assignment doesn't 
> happen with high frequency and thus the overhead introduced won't have much 
> impact on the system performance.
> Approach 1:
> Zookeeper knows the history of how /hbase/unassigned is modified, if we can 
> get zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region 
> transition.
> Approach 2:
> 1.    HBase logs extra region transition data to zookeeper. It could be one 
> zookeeper node per transaction.
> 2.    Have a separate thread on the Master to move data from zookeeper and 
> append to HDFS. That will keep the zookeeper size in check.
> 3.    Have some tool or web UI to show the history of a given region by 
> looking at zookeeper and HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to