[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Yu Li (JIRA) Thu, 21 Apr 2016 20:02:00 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253243#comment-15253243
 ]


Yu Li commented on HBASE-15160:
-------------------------------

Thanks for the efforts [~enis]! Some review comments below:

1. From the latest patch, we're adding keys for read/write count, could you 
clarify the reason for this when we already have the num_ops couting in 
histogram?
{noformat}
  "beans" : [ {
    "name" : "Hadoop:service=HBase,name=RegionServer,sub=IO",
    "modelerType" : "RegionServer,sub=IO",
    "tag.Context" : "regionserver",
    "tag.Hostname" : "hadoop0166.su18.tbsite.net",
    "FsWriteTime_num_ops" : 12049,
    "FsWriteTime_min" : 46284,
    "FsWriteTime_max" : 133946271,
{noformat}

2. Since the read op happens inside a lock in {{HFileReaderImpl#getMetaBlock}}, 
cost of update histogram hurts, and confirmed to be the root cause of the ~3% 
performance regression in my test. I'd suggest to record time of the whole 
{{readBlockData}} call instead of inside {{readAtOffset}} and update the 
histogram out of the synchronized block, which will save the performance 
although causing metrics not that accurate. Excerpt of codes below:
{code:title=HFileReaderImpl#getMetaBlock|borderStyle=solid}
    // Per meta key from any given file, synchronize reads for said block. This
    // is OK to do for meta blocks because the meta block index is always
    // single-level.
    synchronized (metaBlockIndexReader.getRootBlockKey(block)) {
      ...
      HFileBlock metaBlock = fsBlockReader.readBlockData(metaBlockOffset, 
blockSize, true).
          unpack(hfileContext, fsBlockReader);
      ...
      return metaBlock;
    }
{code}

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, 
> HBASE-15160_v3.patch, hbase-15160_v4.patch, hbase-15160_v5.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, 
> fsPreadLatency and fsWriteLatency, have been removed. There was some 
> discussion about putting them back in a new JIRA but never happened. 
> According to our experience, these metrics are useful to judge whether issue 
> lies on HDFS when slow request occurs, so we propose to put them back in this 
> JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Reply via email to