[
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082620#comment-13082620
]
Gary Helmling commented on HBASE-4147:
--------------------------------------
I don't want to hijack this issue, but if we're talking about broader tracing
and monitoring support, I think another inspiration worth looking at is Dapper:
http://research.google.com/pubs/pub36356.html
Some of the situations we're currently trying to help application teams with
are things like: call #1 took 10msec to process, call #2 took 300000msec to
process... Why? We don't have a whole lot at the moment to help in answering
this. Better load stats help see what's going on in the cluster from one
direction. But it still requires a lot of inferring to see how it ties
together from the client end.
@Todd, I like the dtrace-like approach. I think we could start simply with
something like this and spread and evolve it as we go. Possibly even growing
it into distributed tracing. It seems like that's a broader need so maybe we
should move that discussion into another issue.
In general, for additional stats that we export, I would like to gently
encourage a hard-look at whether or not there's a way to incorporate them into
the metrics framework. Not everything will fit with this -- it's particularly
not so great at dynamically named stats (like stats based on store filenames).
But it does give us an existing framework for collecting, aggregating and
reporting those stats, with a variety of tools that integrate nicely. Just
writing stats to log files requires a whole lot more work to actually make the
output useful. Ackowledging sometimes it may be the best/only option though
(I'm currently patching up the RPC logging to make it a little more useful). I
really need to look at metricsv2 and see how much more flexibility it gives us.
@Doug, thanks for the STATSPACK link. I'll read up on that as well.
> StoreFile query usage report
> ----------------------------
>
> Key: HBASE-4147
> URL: https://issues.apache.org/jira/browse/HBASE-4147
> Project: HBase
> Issue Type: Improvement
> Reporter: Doug Meil
> Priority: Minor
> Attachments: hbase_4147_storefilereport.pdf,
> hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come
> by.
> What would be useful is to have a periodic StoreFile query report.
> Specifically, this could run on a configured interval (e.g., every 30
> seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with
> the Path we would also know region, CF, and table), # of times the StoreFile
> was accessed, the size of the StoreFile, and the total time (ms) spent
> processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are
> being accessed the most, and including the StoreFile would provide insight
> into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.
> I'm assuming that users will slice and dice this data on their own so I think
> we should skip any kind of admin view for now (i.e., new JSPs, new APIs to
> expose this data). Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact? Yes. Hopefully small, but yes
> it will. However, flying a plane without any instrumentation isn't fun. :-)
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira