[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Enis Soztutar (JIRA) Fri, 22 Apr 2016 11:17:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254361#comment-15254361
 ]


Enis Soztutar commented on HBASE-15160:
---------------------------------------

bq. 1. From the latest patch, we're adding keys for read/write count, could you 
clarify the reason for this when we already have the num_ops couting in 
histogram?
What I have noticed is that, the num_ops coming from the histograms are reset 
everytime the histograms are reset. We are relying on these counts at the 
regionserver level as well (like get_numOps, etc), but I think it is wrong and 
very hard to interpret because 

bq. 2. Since the read op happens inside a lock in HFileReaderImpl#getMetaBlock, 
cost of update histogram hurts, and confirmed to be the root cause of the ~3% 
performance regression in my test
Thanks [~carp84] for the perf test. I did not put up this patch against YCSB 
yet. Did you try with block cache disabled? The metrics will only get updated 
when an actual read happens obviously, so I was thinking of doing the test with 
block cache turned off. Let me try your suggestion.

I originally changed the location for the histogram update to be inside the 
HFileBlock.readAtOffset() rather than at the HFileReader level since even if 
argument {{pread=false}}, we might end up doing a {{pread}} if we cannot get 
the lock. Otherwise reporting for pread vs read will be slightly wrong if there 
is contention for the input stream lock. 

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, 
> HBASE-15160_v3.patch, hbase-15160_v4.patch, hbase-15160_v5.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, 
> fsPreadLatency and fsWriteLatency, have been removed. There was some 
> discussion about putting them back in a new JIRA but never happened. 
> According to our experience, these metrics are useful to judge whether issue 
> lies on HDFS when slow request occurs, so we propose to put them back in this 
> JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Reply via email to