[jira] [Commented] (HBASE-7958) Statistics per-column family per-region

Jesse Yates (JIRA) Fri, 15 Mar 2013 15:32:16 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603933#comment-13603933
 ]


Jesse Yates commented on HBASE-7958:
------------------------------------

Here's the beauty of the recently posted approach - you could definitely write 
your own stats and then tag each table with it and your off to the races 
(rather than relying on things like JMX and tcollectors, in the case of otsdb).

{quote}
For example the region balancer could find the hottest regions (ones with the 
more requests per second) and automatically balance them across different 
region servers. A region could be split because it is too hot to reduce the 
number of requests rather than only splitting on size.
{quote}

that would be extremely cool (only a little pun intended).

I'd question how OTSDB performs on your scale though - it can collect a whole 
heck of a lot of stats and since it stores them in a very cleanly distributed 
way in HBase, I would be surprised if it wasn't scaling.

My concern is that we don't fill up HDFS with logging stats that are 2-3x what 
the actual datasizes are, something that wouldn't be too far fetched. We just 
need to be careful to make sure we don't keep too much history

bq. Finally we could have some pretty graphs on the HMaster similar to Accumulo

Yeah, they are certainly pretty, but IMO pretty useless for anyone but the most 
novice user.
                
> Statistics per-column family per-region
> ---------------------------------------
>
>                 Key: HBASE-7958
>                 URL: https://issues.apache.org/jira/browse/HBASE-7958
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: hbase-7958_rough-cut-v0.patch, 
> hbase-7958-v0-parent.patch, hbase-7958-v0.patch
>
>
> Originating from this discussion on the dev list: 
> http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain
> Essentially, we should have built-in statistics gathering for HBase tables. 
> This allows clients to have a better understanding of the distribution of 
> keys within a table and a given region. We could also surface this 
> information via the UI.
> There are a couple different proposals from the email, the overview is this:
> We add in something on compactions that gathers stats about the keys that are 
> written and then we surface them to a table.
> The possible proposals include:
> *How to implement it?*
> # Coprocessors - 
> ** advantage - it easily plugs in and people could pretty easily add their 
> own statistics. 
> ** disadvantage - UI elements would also require this, we get into dependent 
> loading, which leads down the OSGi path. Also, these CPs need to be installed 
> _after_ all the other CPs on compaction to ensure they see exactly what gets 
> written (doable, but a pain)
> # Built into HBase as a custom scanner
> ** advantage - always goes in the right place and no need to muck about with 
> loading CPs etc.
> ** disadvantage - less pluggable, at least for the initial cut
> *Where do we store data?*
> # .META.
> ** advantage - its an existing table, so we can jam it into another CF there
> ** disadvantage - this would make META much larger, possibly leading to 
> splits AND will make it much harder for other processes to read the info
> # A new stats table
> ** advantage - cleanly separates out the information from META
> ** disadvantage - should use a 'system table' idea to prevent accidental 
> deletion, manipulation by arbitrary clients, but still allow clients to read 
> it.
> Once we have this framework, we can then move to an actual implementation of 
> various statistics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7958) Statistics per-column family per-region

Reply via email to