[
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081793#comment-13081793
]
Nicolas Spiegelberg commented on HBASE-4147:
--------------------------------------------
@Doug: what is your goal for this JIRA? Collecting stats on StoreFile usage is
really good from a core developer perspective, but it sounds like you want
better DBA tools. For example:
1) Get sampling. You just want a way to log every 1k database commands and have
some collector that displays high level information on get vs put rate, with
basic filtering capabilities.
2) Note that we're developing a version of "show processlist" for HBase that
might also provide the visibility you want (HBASE-4057).
3) Another option is exporting per-CF metrics in addition to our existing
per-server metrics. We have this sorta hacked up for 89 and could give you the
diffs if you want to finish it off for 92.
> StoreFile query usage report
> ----------------------------
>
> Key: HBASE-4147
> URL: https://issues.apache.org/jira/browse/HBASE-4147
> Project: HBase
> Issue Type: Improvement
> Reporter: Doug Meil
> Priority: Minor
> Attachments: hbase_4147_storefilereport.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come
> by.
> What would be useful is to have a periodic StoreFile query report.
> Specifically, this could run on a configured interval (e.g., every 30
> seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with
> the Path we would also know region, CF, and table), # of times the StoreFile
> was accessed, the size of the StoreFile, and the total time (ms) spent
> processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are
> being accessed the most, and including the StoreFile would provide insight
> into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.
> I'm assuming that users will slice and dice this data on their own so I think
> we should skip any kind of admin view for now (i.e., new JSPs, new APIs to
> expose this data). Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact? Yes. Hopefully small, but yes
> it will. However, flying a plane without any instrumentation isn't fun. :-)
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira