[jira] [Commented] (HBASE-4089) blockCache contents report

Ming Ma (JIRA) Thu, 11 Aug 2011 14:46:54 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083671#comment-13083671
 ]


Ming Ma commented on HBASE-4089:
--------------------------------

useful doc, Doug.

It seems like this one, https://issues.apache.org/jira/browse/HBASE-4147, 
https://issues.apache.org/jira/browse/HBASE-4145 need some common 
infrastructure to log and analyze structured data.

1. RS Web UI is useful. But that only provides the most recent value.

2. As you mentioned in the doc, we can create a static metric for each 
combination of table and CF. That could end up with lots of metrics. Might not 
be ideal.

3. How we plan to analyze the data is an important factor for the design.
   a. Is there a latency requirement? In a production system, it is better to 
get these reports sooner than later.
   b. Is it easy to do query and analysis on the data?, e.g., aggregate, max, 
etc.

4. Some ideas along the line of custom output
   a. Can the log data be asynchronously uploaded to a special table in hbase? 
It might be a bit strange to upload data back to hbase. However, for 
performance, we can partition the special table into regions so that each 
region is colocated on the same RS where the log is generated; no automatic 
compaction, split, load balancing on the table.
   b. Upload the log to HDFS periodically. Run map reduce jobs to mine the data 
with a customized inputformat. This might be ok if there is no strong latency 
requirement.


> blockCache contents report
> --------------------------
>
>                 Key: HBASE-4089
>                 URL: https://issues.apache.org/jira/browse/HBASE-4089
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>         Attachments: hbase_4089_blockcachereport.pdf, 
> java_blockcache_checkpoint_2011_08_11.patch
>
>
> Summarized block-cache report for a RegionServer would be helpful.  For 
> example ...
> table1
>   cf1   100 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
>   cf2   200 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours
> table2
>   cf1  75 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
>   cf2 150 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours
> ... Etc.
> The current metrics list blockCacheSize and blockCacheFree, but there is no 
> way to know what's in there.  Any single block isn't really important, but 
> the patterns of what CF/Table they came from, how big are they, and how long 
> (on average) they've been in the cache, are important.
> No such interface exists in HRegionInterface.  But I think it would be 
> helpful from an operational perspective.
> Updated (7-29):  Removing suggestion for UI.  I would be happy just to get 
> this report on a configured interval dumped to a log file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4089) blockCache contents report

Reply via email to