[
https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yu Li updated HBASE-15160:
--------------------------
Attachment: hbase-15160_v7.patch
Confirmed that with {{System#currentTimeMillis}} the performance regression
disappeared.
|| Case || Throughput (ops/s)|| AverageLatency(us)||
| w/o patch| 122079.26|26019.93|
|w/ patch v7| 121693.28 | 26688.72|
Although this might only happen when using fast disk like PCIe-SSD, I think we
should still make the change. What's more, milliseconds should be enough to
monitor spike. Below is the metrics data in the testing with PCIe-SSD:
{noformat}
"FsPReadTime_num_ops" : 21828053,
"FsPReadTime_min" : 0,
"FsPReadTime_max" : 103,
"FsPReadTime_mean" : 3,
"FsPReadTime_25th_percentile" : 0,
"FsPReadTime_median" : 0,
"FsPReadTime_75th_percentile" : 5,
"FsPReadTime_90th_percentile" : 7,
"FsPReadTime_95th_percentile" : 9,
"FsPReadTime_98th_percentile" : 17,
"FsPReadTime_99th_percentile" : 91,
"FsPReadTime_99.9th_percentile" : 98,
"FsPReadTime_TimeRangeCount_0-1" : 26267,
"FsPReadTime_TimeRangeCount_1-3" : 455,
"FsPReadTime_TimeRangeCount_3-10" : 8366,
"FsPReadTime_TimeRangeCount_10-30" : 661,
"FsPReadTime_TimeRangeCount_30-100" : 705,
"FsPReadTime_TimeRangeCount_100-300" : 15,
"FsPReadTime_TimeRangeCount_600000-inf" : 21791593,
{noformat}
> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
> Key: HBASE-15160
> URL: https://issues.apache.org/jira/browse/HBASE-15160
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0, 1.1.2
> Reporter: Yu Li
> Assignee: Yu Li
> Priority: Critical
> Attachments: HBASE-15160.patch, HBASE-15160_v2.patch,
> HBASE-15160_v3.patch, hbase-15160_v4.patch, hbase-15160_v5.patch,
> hbase-15160_v6.patch, hbase-15160_v7.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency,
> fsPreadLatency and fsWriteLatency, have been removed. There was some
> discussion about putting them back in a new JIRA but never happened.
> According to our experience, these metrics are useful to judge whether issue
> lies on HDFS when slow request occurs, so we propose to put them back in this
> JIRA, and add the metrics for monitoring as well.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)