[
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082656#comment-13082656
]
Doug Meil commented on HBASE-4147:
----------------------------------
With user example #1 of my updated writeup, an example of that happened on the
dist-list recently. He still had major compactions scheduled daily, and his
cluster "got really slow" every day. If he had summarized 5 min slices
differentiating system and user activity, he could tell which
table/reg/cf/storefile was getting system (compaction) activity, etc.
I doubt this answers your particular case, but I think this is a fairly common
occurrence on with users.
Another example is #3 example of the "degree of uncompaction" stats (e.g., how
much time is being spent reading each StoreFile.) This kind of information
would also help the "HBase as queue" issue that was on the dist-list recently.
They weren't doing major compactions and apparently had a lot of StoreFiles.
#2 is the "hybrid activity" use-case. e.g., MR job going on one table while
there are also random reads on another table. At least seeing the activity
during the reporting slice would let you know that something else is happening
on the cluster and what is getting accessed.
> StoreFile query usage report
> ----------------------------
>
> Key: HBASE-4147
> URL: https://issues.apache.org/jira/browse/HBASE-4147
> Project: HBase
> Issue Type: Improvement
> Reporter: Doug Meil
> Priority: Minor
> Attachments: hbase_4147_storefilereport.pdf,
> hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come
> by.
> What would be useful is to have a periodic StoreFile query report.
> Specifically, this could run on a configured interval (e.g., every 30
> seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with
> the Path we would also know region, CF, and table), # of times the StoreFile
> was accessed, the size of the StoreFile, and the total time (ms) spent
> processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are
> being accessed the most, and including the StoreFile would provide insight
> into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.
> I'm assuming that users will slice and dice this data on their own so I think
> we should skip any kind of admin view for now (i.e., new JSPs, new APIs to
> expose this data). Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact? Yes. Hopefully small, but yes
> it will. However, flying a plane without any instrumentation isn't fun. :-)
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira