[jira] [Commented] (HDFS-16949) Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics

ASF GitHub Bot (Jira) Wed, 29 Mar 2023 13:33:08 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706593#comment-17706593
 ]


ASF GitHub Bot commented on HDFS-16949:
---------------------------------------

goiri commented on code in PR #5495:
URL: https://github.com/apache/hadoop/pull/5495#discussion_r1152443921


##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java:
##########
@@ -92,27 +93,68 @@ public void testClear() throws IOException {
   public void testQuantileError() throws IOException {
     final int count = 100000;
     Random r = new Random(0xDEADDEAD);

Review Comment:
   Where are we using this random?



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java:
##########
@@ -92,27 +93,68 @@ public void testClear() throws IOException {
   public void testQuantileError() throws IOException {
     final int count = 100000;
     Random r = new Random(0xDEADDEAD);
-    Long[] values = new Long[count];
+    int[] values = new int[count];
     for (int i = 0; i < count; i++) {
-      values[i] = (long) (i + 1);
+      values[i] = i + 1;
     }
+
     // Do 10 shuffle/insert/check cycles
     for (int i = 0; i < 10; i++) {
-      System.out.println("Starting run " + i);
+
+      // Shuffle  
       Collections.shuffle(Arrays.asList(values), r);
       estimator.clear();
+
+      // Insert
       for (int j = 0; j < count; j++) {
         estimator.insert(values[j]);
       }
       Map<Quantile, Long> snapshot;
       snapshot = estimator.snapshot();
+
+      // Check
       for (Quantile q : quantiles) {
         long actual = (long) (q.quantile * count);
         long error = (long) (q.error * count);
         long estimate = snapshot.get(q);
-        System.out
-            .println(String.format("Expected %d with error %d, estimated %d",
-                actual, error, estimate));
+        assertThat(estimate <= actual + error).isTrue();
+        assertThat(estimate >= actual - error).isTrue();
+      }
+    }
+  }
+
+  /**
+   * Correctness test that checks that absolute error of the estimate for 
inverse quantiles
+   * is within specified error bounds for some randomly permuted streams of 
items.
+   */
+  @Test
+  public void testInverseQuantiles() throws IOException {
+    SampleQuantiles inverseQuantilesEstimator = new 
SampleQuantiles(MutableInverseQuantiles.INVERSE_QUANTILES);
+    final int count = 100000;
+    Random r = new Random(0xDEADDEAD);
+    int[] values = new int[count];
+    for (int i = 0; i < count; i++) {
+      values[i] = i + 1;
+    }
+
+    // Do 10 shuffle/insert/check cycles
+    for (int i = 0; i < 10; i++) {

Review Comment:
   Make 10 a constant just to show is NUM_REPEATS or something like that.



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java:
##########
@@ -92,27 +93,68 @@ public void testClear() throws IOException {
   public void testQuantileError() throws IOException {
     final int count = 100000;
     Random r = new Random(0xDEADDEAD);

Review Comment:
   OK, is the shuffle.
   Hard to search single letter vars, make it `rnd`.



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java:
##########
@@ -92,27 +93,68 @@ public void testClear() throws IOException {
   public void testQuantileError() throws IOException {
     final int count = 100000;
     Random r = new Random(0xDEADDEAD);
-    Long[] values = new Long[count];
+    int[] values = new int[count];
     for (int i = 0; i < count; i++) {
-      values[i] = (long) (i + 1);
+      values[i] = i + 1;
     }
+
     // Do 10 shuffle/insert/check cycles
     for (int i = 0; i < 10; i++) {
-      System.out.println("Starting run " + i);
+
+      // Shuffle  
       Collections.shuffle(Arrays.asList(values), r);
       estimator.clear();
+
+      // Insert
       for (int j = 0; j < count; j++) {
         estimator.insert(values[j]);
       }
       Map<Quantile, Long> snapshot;
       snapshot = estimator.snapshot();
+
+      // Check
       for (Quantile q : quantiles) {
         long actual = (long) (q.quantile * count);
         long error = (long) (q.error * count);
         long estimate = snapshot.get(q);
-        System.out
-            .println(String.format("Expected %d with error %d, estimated %d",
-                actual, error, estimate));
+        assertThat(estimate <= actual + error).isTrue();
+        assertThat(estimate >= actual - error).isTrue();
+      }
+    }
+  }
+
+  /**
+   * Correctness test that checks that absolute error of the estimate for 
inverse quantiles
+   * is within specified error bounds for some randomly permuted streams of 
items.
+   */
+  @Test
+  public void testInverseQuantiles() throws IOException {
+    SampleQuantiles inverseQuantilesEstimator = new 
SampleQuantiles(MutableInverseQuantiles.INVERSE_QUANTILES);
+    final int count = 100000;
+    Random r = new Random(0xDEADDEAD);
+    int[] values = new int[count];
+    for (int i = 0; i < count; i++) {
+      values[i] = i + 1;
+    }
+
+    // Do 10 shuffle/insert/check cycles
+    for (int i = 0; i < 10; i++) {
+      // Shuffle
+      Collections.shuffle(Arrays.asList(values), r);
+      inverseQuantilesEstimator.clear();
+
+      // Insert
+      for (int j = 0; j < count; j++) {

Review Comment:
   ```
   for (int value : values) {
     inverseQuantilesEstimator.insert(value);
   }
   ```



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java:
##########
@@ -92,27 +93,68 @@ public void testClear() throws IOException {
   public void testQuantileError() throws IOException {
     final int count = 100000;
     Random r = new Random(0xDEADDEAD);
-    Long[] values = new Long[count];
+    int[] values = new int[count];
     for (int i = 0; i < count; i++) {
-      values[i] = (long) (i + 1);
+      values[i] = i + 1;
     }
+
     // Do 10 shuffle/insert/check cycles
     for (int i = 0; i < 10; i++) {
-      System.out.println("Starting run " + i);
+
+      // Shuffle  
       Collections.shuffle(Arrays.asList(values), r);
       estimator.clear();
+
+      // Insert
       for (int j = 0; j < count; j++) {

Review Comment:
   As we are at cleaning:
   ```
   for (int value : values) {
     estimator.insert(value);
   }
   ```



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java:
##########
@@ -118,4 +119,40 @@ public void testQuantileError() throws IOException {
       }
     }
   }
+
+  /**
+   * Correctness test that checks that absolute error of the estimate for 
inverse quantiles
+   * is within specified error bounds for some randomly permuted streams of 
items.
+   */
+  @Test
+  public void testInverseQuantiles() throws IOException {
+    SampleQuantiles inverseQuantilesEstimator = new 
SampleQuantiles(MutableInverseQuantiles.INVERSE_QUANTILES);
+    final int count = 100000;
+    Random r = new Random(0xDEADDEAD);

Review Comment:
   Make it `rnd`; it is hard to find.





> Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-16949
>                 URL: https://issues.apache.org/jira/browse/HDFS-16949
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Ravindra Dingankar
>            Assignee: Ravindra Dingankar
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.3.0, 3.4.0
>
>
> HDFS-16917 added ReadTransferRate quantiles to calculate the rate which data 
> is read per unit of time.
> With percentiles the values are sorted in ascending order and hence for the 
> transfer rate p90 gives us the value where 90 percent rates are lower 
> (worse), p99 gives us the value where 99 percent values are lower (worse).
> Note that value(p90) < p(99) thus p99 is a better transfer rate as compared 
> to p90.
> However as the percentile increases the value should become worse in order to 
> know how good our system is.
> Hence instead of calculating the data read transfer rate, we should calculate 
> it's inverse. We will instead calculate the time taken for a GB of data to be 
> read. ( seconds / GB )
> After this the p90 value will give us 90 percentage of total values where the 
> time taken is less than value(p90), similarly for p99 and others.
> Also p(90) < p(99) and here p(99) will become a worse value (taking more time 
> each byte) as compared to p(90)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16949) Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics

Reply via email to