[jira] [Created] (HBASE-7958) Statistics per-column family per-region

Jesse Yates (JIRA) Wed, 27 Feb 2013 17:52:13 -0800

Jesse Yates created HBASE-7958:
----------------------------------

             Summary: Statistics per-column family per-region
                 Key: HBASE-7958
                 URL: https://issues.apache.org/jira/browse/HBASE-7958
             Project: HBase
          Issue Type: New Feature
    Affects Versions: 0.96.0
            Reporter: Jesse Yates
             Fix For: 0.96.0



Originating from this discussion on the dev list: 
http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain

Essentially, we should have built-in statistics gathering for HBase tables. 
This allows clients to have a better understanding of the distribution of keys 
within a table and a given region. We could also surface this information via 
the UI.

There are a couple different proposals from the email, the overview is this:
We add in something on compactions that gathers stats about the keys that are 
written and then we surface them to a table.

The possible proposals include:

*How to implement it?*
# Coprocessors - 
** advantage - it easily plugs in and people could pretty easily add their own 
statistics. 
** disadvantage - UI elements would also require this, we get into dependent 
loading, which leads down the OSGi path. Also, these CPs need to be installed 
_after_ all the other CPs on compaction to ensure they see exactly what gets 
written (doable, but a pain)
# Built into HBase as a custom scanner
** advantage - always goes in the right place and no need to muck about with 
loading CPs etc.
** disadvantage - less pluggable, at least for the initial cut


*Where do we store data?*
# .META.
** advantage - its an existing table, so we can jam it into another CF there
** disadvantage - this would make META much larger, possibly leading to splits 
AND will make it much harder for other processes to read the info
# A new stats table
** advantage - cleanly separates out the information from META
** disadvantage - should use a 'system table' idea to prevent accidental 
deletion, manipulation by arbitrary clients, but still allow clients to read it.

Once we have this framework, we can then move to an actual implementation of 
various statistics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7958) Statistics per-column family per-region

Reply via email to