[
https://issues.apache.org/jira/browse/HBASE-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083671#comment-13083671
]
Ming Ma commented on HBASE-4089:
--------------------------------
useful doc, Doug.
It seems like this one, https://issues.apache.org/jira/browse/HBASE-4147,
https://issues.apache.org/jira/browse/HBASE-4145 need some common
infrastructure to log and analyze structured data.
1. RS Web UI is useful. But that only provides the most recent value.
2. As you mentioned in the doc, we can create a static metric for each
combination of table and CF. That could end up with lots of metrics. Might not
be ideal.
3. How we plan to analyze the data is an important factor for the design.
a. Is there a latency requirement? In a production system, it is better to
get these reports sooner than later.
b. Is it easy to do query and analysis on the data?, e.g., aggregate, max,
etc.
4. Some ideas along the line of custom output
a. Can the log data be asynchronously uploaded to a special table in hbase?
It might be a bit strange to upload data back to hbase. However, for
performance, we can partition the special table into regions so that each
region is colocated on the same RS where the log is generated; no automatic
compaction, split, load balancing on the table.
b. Upload the log to HDFS periodically. Run map reduce jobs to mine the data
with a customized inputformat. This might be ok if there is no strong latency
requirement.
> blockCache contents report
> --------------------------
>
> Key: HBASE-4089
> URL: https://issues.apache.org/jira/browse/HBASE-4089
> Project: HBase
> Issue Type: New Feature
> Reporter: Doug Meil
> Assignee: Doug Meil
> Attachments: hbase_4089_blockcachereport.pdf,
> java_blockcache_checkpoint_2011_08_11.patch
>
>
> Summarized block-cache report for a RegionServer would be helpful. For
> example ...
> table1
> cf1 100 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
> cf2 200 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours
> table2
> cf1 75 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
> cf2 150 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours
> ... Etc.
> The current metrics list blockCacheSize and blockCacheFree, but there is no
> way to know what's in there. Any single block isn't really important, but
> the patterns of what CF/Table they came from, how big are they, and how long
> (on average) they've been in the cache, are important.
> No such interface exists in HRegionInterface. But I think it would be
> helpful from an operational perspective.
> Updated (7-29): Removing suggestion for UI. I would be happy just to get
> this report on a configured interval dumped to a log file.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira