[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Yu Li (JIRA) Sat, 27 May 2017 01:13:28 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027352#comment-16027352
 ]


Yu Li commented on HBASE-15160:
-------------------------------

Have checked the patch and some comments:

bq. the callers of readBlock() do not know whether the returned block is read 
from disk, or comes from cache
I could see there's a {{if (cacheConf.isBlockCacheEnabled())}} check in 
{{HFileReaderImpl#readBlock}} where the cached block will be returned if hit, 
so we could simply update the metrics outside the if check? And with the same 
method we could also record the IO time of {{getMetaBlock}} in the finally 
clause (if cache missed). Wdyt?

Previously the concern on {{readAtOffset}} completely make sense, but 
HBASE-17917 has removed the stream lock so no more stream read when {{pread}} 
is true, which makes it possible to move the updating of the metrics up to the 
caller (smile).

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Critical
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, 
> HBASE-15160_v3.patch, hbase-15160_v4.patch, hbase-15160_v5.patch, 
> hbase-15160_v6.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, 
> fsPreadLatency and fsWriteLatency, have been removed. There was some 
> discussion about putting them back in a new JIRA but never happened. 
> According to our experience, these metrics are useful to judge whether issue 
> lies on HDFS when slow request occurs, so we propose to put them back in this 
> JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Reply via email to