[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Enis Soztutar (JIRA) Wed, 02 Mar 2016 15:37:47 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176718#comment-15176718
 ]


Enis Soztutar commented on HBASE-15160:
---------------------------------------

bq. Yes, already made the change in the latest patch. 
Ok, I was looking at the following for why we are not using a histogram for 
this: 
{code}
+  private static final BlockingQueue<Long> fsReadLatenciesNanos =
+      new ArrayBlockingQueue<Long>(LATENCY_BUFFER_SIZE);
+  private static final BlockingQueue<Long> fsWriteLatenciesNanos =
+      new ArrayBlockingQueue<Long>(LATENCY_BUFFER_SIZE);
{code}

For every RPC and for every operation (get, etc), we already increment counters 
or histograms directly inline, rather than keeping track of individual points 
like the one in the patch and bulk updating the histograms frequently. Since 
num gets > num fs operations in theory, doing the counter updates inline should 
not be a perf regression. This is of course to be verified if possible. 

One other thing is that instead of using the histogram inline (which is based 
on FastLongHistogram / Counters and high perf counters) we are using a 
BlockingQueue which is using a RWLock and in-theory more costly. So doing this 
indirect way maybe even worse than doing inline updates. 

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, 
> HBASE-15160_v3.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, 
> fsPreadLatency and fsWriteLatency, have been removed. There was some 
> discussion about putting them back in a new JIRA but never happened. 
> According to our experience, these metrics are useful to judge whether issue 
> lies on HDFS when slow request occurs, so we propose to put them back in this 
> JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

Reply via email to