[ 
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073606#comment-13073606
 ] 

Todd Lipcon commented on HBASE-4147:
------------------------------------

Maybe I'm wandering out of scope, but I've been thinking a bit about statistics 
and monitoring as well recently, and I think it might integrate well with this 
work.

The idea is to define various probe points (to use the dtrace terminology) 
throughout the code. Each probe point would have a name and some predefined set 
of arguments. For example, in the HFile code you might have:
{code}
HFile() {
  this.readTrace = Tracer.get("hfile.read.complete");
}

read() {
...
if (readTrace != null && readTrace.isEnabled()) {
  readTrace.trace(millisSpent, thisHFilePath, blockIdx, ...);
}
{code}

then different things interested in this tracing data can subscribe to the 
trace point -- in this case in order to collect aggregate statistics for each 
30 second period, though other applications would be useful as well. (eg 
dynamically attach a listener to sample some percentage of requests)

Advantage of the above design is that it's flexible, and if off-by-default 
should have no performance impact since it will be basically jitted away

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come 
> by.
> What would be useful is to have a periodic StoreFile query report.  
> Specifically, this could run on a configured interval (e.g., every 30 
> seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with 
> the Path we would also know region, CF, and table), # of times the StoreFile 
> was accessed, the size of the StoreFile, and the total time (ms) spent 
> processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are 
> being accessed the most, and including the StoreFile would provide insight 
> into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  
> I'm assuming that users will slice and dice this data on their own so I think 
> we should skip any kind of admin view for now (i.e., new JSPs, new APIs to 
> expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes 
> it will.  However, flying a plane without any instrumentation isn't fun.  :-) 
>  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to