[ 
https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13595132#comment-13595132
 ] 

Jesse Yates commented on HBASE-7958:
------------------------------------

So it looks like there is a desire for a pretty large range of possible 
statistics. I'd rather we don't get bogged down in what specific statistics we 
want, but push more towards a design discussion around enabling people to 
capture these statistics. We know we want them, the question is how :)

Once we have the mechanisms in place to read/write a stats table for an 
individual stat, we can much more easily expand that support stats at different 
tie-in places. The 'at compaction time histogram' seemed like an easy enough 
starting place for _one type of stat_, but that should not necessarily limit 
possible stats that can be collected; its an immediate use-case for a general 
statistics table.

Stepping back, it seems to me that we can have a basic set of statistics that 
you can enable for a table at creation time (or even turn it on later too). We 
then also need a mechanism to let people add their own statistics easily 
(thinking a CP hook here). From there, we just need to have an mechanism to 
make it easy to access each statistic.

I don't think any of the above proposals really changes my proposed 
outline-patch besides making it easy(easier?) to hook in custom stat 
implementations, a clean dynamic loading mechanism (from the various //TODOs 
for CP hooks), and a little more utility in the StatisticsTable class to make 
it easy to read a stat.

Sound reasonable?
                
> Statistics per-column family per-region
> ---------------------------------------
>
>                 Key: HBASE-7958
>                 URL: https://issues.apache.org/jira/browse/HBASE-7958
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: hbase-7958_rough-cut-v0.patch
>
>
> Originating from this discussion on the dev list: 
> http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain
> Essentially, we should have built-in statistics gathering for HBase tables. 
> This allows clients to have a better understanding of the distribution of 
> keys within a table and a given region. We could also surface this 
> information via the UI.
> There are a couple different proposals from the email, the overview is this:
> We add in something on compactions that gathers stats about the keys that are 
> written and then we surface them to a table.
> The possible proposals include:
> *How to implement it?*
> # Coprocessors - 
> ** advantage - it easily plugs in and people could pretty easily add their 
> own statistics. 
> ** disadvantage - UI elements would also require this, we get into dependent 
> loading, which leads down the OSGi path. Also, these CPs need to be installed 
> _after_ all the other CPs on compaction to ensure they see exactly what gets 
> written (doable, but a pain)
> # Built into HBase as a custom scanner
> ** advantage - always goes in the right place and no need to muck about with 
> loading CPs etc.
> ** disadvantage - less pluggable, at least for the initial cut
> *Where do we store data?*
> # .META.
> ** advantage - its an existing table, so we can jam it into another CF there
> ** disadvantage - this would make META much larger, possibly leading to 
> splits AND will make it much harder for other processes to read the info
> # A new stats table
> ** advantage - cleanly separates out the information from META
> ** disadvantage - should use a 'system table' idea to prevent accidental 
> deletion, manipulation by arbitrary clients, but still allow clients to read 
> it.
> Once we have this framework, we can then move to an actual implementation of 
> various statistics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to