[ https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594307#comment-13594307 ]
Andrew Purtell edited comment on HBASE-7958 at 3/6/13 3:40 AM: --------------------------------------------------------------- I'd like to see a histogram of operations taken on the CF, for subsequent autotuning for read-mostly, mixed, or write-mostly workloads. was (Author: apurtell): I'd like to see a histogram of operations taken on the region, for subsequent autotuning for read-mostly, mixed, or write-mostly workloads. > Statistics per-column family per-region > --------------------------------------- > > Key: HBASE-7958 > URL: https://issues.apache.org/jira/browse/HBASE-7958 > Project: HBase > Issue Type: New Feature > Affects Versions: 0.96.0 > Reporter: Jesse Yates > Assignee: Jesse Yates > Fix For: 0.96.0 > > Attachments: hbase-7958_rough-cut-v0.patch > > > Originating from this discussion on the dev list: > http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain > Essentially, we should have built-in statistics gathering for HBase tables. > This allows clients to have a better understanding of the distribution of > keys within a table and a given region. We could also surface this > information via the UI. > There are a couple different proposals from the email, the overview is this: > We add in something on compactions that gathers stats about the keys that are > written and then we surface them to a table. > The possible proposals include: > *How to implement it?* > # Coprocessors - > ** advantage - it easily plugs in and people could pretty easily add their > own statistics. > ** disadvantage - UI elements would also require this, we get into dependent > loading, which leads down the OSGi path. Also, these CPs need to be installed > _after_ all the other CPs on compaction to ensure they see exactly what gets > written (doable, but a pain) > # Built into HBase as a custom scanner > ** advantage - always goes in the right place and no need to muck about with > loading CPs etc. > ** disadvantage - less pluggable, at least for the initial cut > *Where do we store data?* > # .META. > ** advantage - its an existing table, so we can jam it into another CF there > ** disadvantage - this would make META much larger, possibly leading to > splits AND will make it much harder for other processes to read the info > # A new stats table > ** advantage - cleanly separates out the information from META > ** disadvantage - should use a 'system table' idea to prevent accidental > deletion, manipulation by arbitrary clients, but still allow clients to read > it. > Once we have this framework, we can then move to an actual implementation of > various statistics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira