[jira] [Commented] (HDFS-16949) Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics

ASF GitHub Bot (Jira) Tue, 21 Mar 2023 10:10:39 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703302#comment-17703302
 ]


ASF GitHub Bot commented on HDFS-16949:
---------------------------------------

rdingankar commented on code in PR #5495:
URL: https://github.com/apache/hadoop/pull/5495#discussion_r1143742742


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/MutableInverseQuantiles.java:
##########
@@ -0,0 +1,102 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.metrics2.lib;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.metrics2.MetricsInfo;
+import org.apache.hadoop.metrics2.util.Quantile;
+import org.apache.hadoop.metrics2.util.SampleQuantiles;
+import 
org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.ScheduledFuture;
+import java.util.concurrent.TimeUnit;
+import static org.apache.hadoop.metrics2.lib.Interns.info;
+
+/**
+ * Watches a stream of long values, maintaining online estimates of specific
+ * quantiles with provably low error bounds. Inverse quantiles are meant for
+ * highly accurate low-percentile (e.g. 1st, 5th) latency metrics.
+ * InverseQuantiles are used for metrics where higher the value better it is.
+ * ( eg: data transfer rate ).
+ * The 1st percentile here corresponds to the 99th inverse percentile metric,
+ * 5th percentile to 95th and so on.
+ */
[email protected]
[email protected]
+public class MutableInverseQuantiles extends MutableQuantiles{
+
+  @VisibleForTesting
+  public static final Quantile[] INVERSE_QUANTILES = { new Quantile(0.50, 
0.050),

Review Comment:
   Just reversing the list-order traversal does not work well with Inverse 
Quantiles as seen in PR #5486.
   It does not work because for optimization [not all values are 
stored](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/util/SampleQuantiles.java#L176)
 in memory.
   For quantiles more values at the higher percentile are stored ( 99, 95, 90 
..) for a smaller allowed error percentage ( 0.1, 0.5, 1 % ). For lower 
percentiles the allowed error increases (+-10% error for 1 percentile) giving 
us less accurate values.
   
   
   If we store all the quantiles ( p99, p95, p90, p75, p50, p25, p10, p5, p1 ) 
with allowed error percentages as (0.1, 0.5, 1, 2.5, 5, 2.5, 1, 0.5. 0.1 ) then 
it will defeat the purpose of having space optimization since we will end up 
storing all the values anyways.
   
   Thus limiting normal quantiles to p99, p95, p90, p75, p50 (with existing 
allowed error % to be .1, .5, 1, 2.5, 5)
   And inverse quantiles to just p1, p5, p10, p25, p50 (with allowed error % to 
be .1, .5, 1, 2.5, 5) will give us the optimization that we need as well as the 
more accurate results for the metric we care for both of them.





> Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-16949
>                 URL: https://issues.apache.org/jira/browse/HDFS-16949
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Ravindra Dingankar
>            Assignee: Ravindra Dingankar
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.3.0, 3.4.0
>
>
> HDFS-16917 added ReadTransferRate quantiles to calculate the rate which data 
> is read per unit of time.
> With percentiles the values are sorted in ascending order and hence for the 
> transfer rate p90 gives us the value where 90 percent rates are lower 
> (worse), p99 gives us the value where 99 percent values are lower (worse).
> Note that value(p90) < p(99) thus p99 is a better transfer rate as compared 
> to p90.
> However as the percentile increases the value should become worse in order to 
> know how good our system is.
> Hence instead of calculating the data read transfer rate, we should calculate 
> it's inverse. We will instead calculate the time taken for a GB of data to be 
> read. ( seconds / GB )
> After this the p90 value will give us 90 percentage of total values where the 
> time taken is less than value(p90), similarly for p99 and others.
> Also p(90) < p(99) and here p(99) will become a worse value (taking more time 
> each byte) as compared to p(90)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16949) Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics

Reply via email to