[ 
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082620#comment-13082620
 ] 

Gary Helmling commented on HBASE-4147:
--------------------------------------

I don't want to hijack this issue, but if we're talking about broader tracing 
and monitoring support, I think another inspiration worth looking at is Dapper:
http://research.google.com/pubs/pub36356.html

Some of the situations we're currently trying to help application teams with 
are things like:  call #1 took 10msec to process, call #2 took 300000msec to 
process... Why?  We don't have a whole lot at the moment to help in answering 
this.  Better load stats help see what's going on in the cluster from one 
direction.  But it still requires a lot of inferring to see how it ties 
together from the client end.

@Todd, I like the dtrace-like approach.  I think we could start simply with 
something like this and spread and evolve it as we go.  Possibly even growing 
it into distributed tracing.  It seems like that's a broader need so maybe we 
should move that discussion into another issue.

In general, for additional stats that we export, I would like to gently 
encourage a hard-look at whether or not there's a way to incorporate them into 
the metrics framework.  Not everything will fit with this -- it's particularly 
not so great at dynamically named stats (like stats based on store filenames).  
But it does give us an existing framework for collecting, aggregating and 
reporting those stats, with a variety of tools that integrate nicely.  Just 
writing stats to log files requires a whole lot more work to actually make the 
output useful.  Ackowledging sometimes it may be the best/only option though 
(I'm currently patching up the RPC logging to make it a little more useful).  I 
really need to look at metricsv2 and see how much more flexibility it gives us.

@Doug, thanks for the STATSPACK link.  I'll read up on that as well.

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, 
> hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come 
> by.
> What would be useful is to have a periodic StoreFile query report.  
> Specifically, this could run on a configured interval (e.g., every 30 
> seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with 
> the Path we would also know region, CF, and table), # of times the StoreFile 
> was accessed, the size of the StoreFile, and the total time (ms) spent 
> processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are 
> being accessed the most, and including the StoreFile would provide insight 
> into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  
> I'm assuming that users will slice and dice this data on their own so I think 
> we should skip any kind of admin view for now (i.e., new JSPs, new APIs to 
> expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes 
> it will.  However, flying a plane without any instrumentation isn't fun.  :-) 
>  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to