[jira] [Commented] (HDFS-16949) Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics

ASF GitHub Bot (Jira) Sun, 19 Mar 2023 22:55:57 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702455#comment-17702455
 ]


ASF GitHub Bot commented on HDFS-16949:
---------------------------------------

rdingankar opened a new pull request, #5495:
URL: https://github.com/apache/hadoop/pull/5495

   …ic value is better
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   Currently quantiles are used for latencies, where lower numeric value is 
better.
   
   Hence p90 gives us a value val(p90) such that 90% of our sample set has a 
value better (lower) than val(p90)
   
    
   
   However for metrics such as calculating transfer rates (eg : 
[HDFS-16917](https://issues.apache.org/jira/browse/HDFS-16917) ) higher numeric 
value is better. Thus for such metrics the current quantiles dont work.
   
   For these metrics in order for p90 to give a value val(p90) where 90% of the 
sample set is better (higher) than val(p90) we need to inverse the selection by 
choosing a value at the (100 - 90)th location instead of the usual 90th 
position.
   
   Note: There will be error guarantees for percentiles as our quantile 
implementation following [Cormode, Korn, Muthukrishnan, and Srivastava 
algorithm](http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf)
   
   ### How was this patch tested?
   Results from UT testInverseQuantiles()
   Starting run 0
   Expected 50000 with error 5000, estimated 50502
   Expected 75000 with error 2500, estimated 75136
   Expected 90000 with error 1000, estimated 90052
   Expected 95000 with error 500, estimated 95029
   Expected 99000 with error 100, estimated 98993
   Starting run 1
   Expected 50000 with error 5000, estimated 50796
   Expected 75000 with error 2500, estimated 75150
   Expected 90000 with error 1000, estimated 89978
   Expected 95000 with error 500, estimated 94993
   Expected 99000 with error 100, estimated 99000
   Starting run 2
   Expected 50000 with error 5000, estimated 50743
   Expected 75000 with error 2500, estimated 75299
   Expected 90000 with error 1000, estimated 90112
   Expected 95000 with error 500, estimated 95044
   Expected 99000 with error 100, estimated 99002
   Starting run 3
   Expected 50000 with error 5000, estimated 51313
   Expected 75000 with error 2500, estimated 75495
   Expected 90000 with error 1000, estimated 90003
   Expected 95000 with error 500, estimated 95048
   Expected 99000 with error 100, estimated 99002
   Starting run 4
   Expected 50000 with error 5000, estimated 50258
   Expected 75000 with error 2500, estimated 75208
   Expected 90000 with error 1000, estimated 90091
   Expected 95000 with error 500, estimated 94990
   Expected 99000 with error 100, estimated 99005
   Starting run 5
   Expected 50000 with error 5000, estimated 50718
   Expected 75000 with error 2500, estimated 75308
   Expected 90000 with error 1000, estimated 90028
   Expected 95000 with error 500, estimated 95020
   Expected 99000 with error 100, estimated 99007
   Starting run 6
   Expected 50000 with error 5000, estimated 50203
   Expected 75000 with error 2500, estimated 75368
   Expected 90000 with error 1000, estimated 90047
   Expected 95000 with error 500, estimated 95024
   Expected 99000 with error 100, estimated 99006
   Starting run 7
   Expected 50000 with error 5000, estimated 50423
   Expected 75000 with error 2500, estimated 75196
   Expected 90000 with error 1000, estimated 90064
   Expected 95000 with error 500, estimated 95005
   Expected 99000 with error 100, estimated 99010
   Starting run 8
   Expected 50000 with error 5000, estimated 50380
   Expected 75000 with error 2500, estimated 75311
   Expected 90000 with error 1000, estimated 90028
   Expected 95000 with error 500, estimated 95002
   Expected 99000 with error 100, estimated 98999
   Starting run 9
   Expected 50000 with error 5000, estimated 50548
   Expected 75000 with error 2500, estimated 75366
   Expected 90000 with error 1000, estimated 90079
   Expected 95000 with error 500, estimated 95010
   Expected 99000 with error 100, estimated 98994
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-16949
>                 URL: https://issues.apache.org/jira/browse/HDFS-16949
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Ravindra Dingankar
>            Assignee: Ravindra Dingankar
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.3.0, 3.4.0
>
>
> HDFS-16917 added ReadTransferRate quantiles to calculate the rate which data 
> is read per unit of time.
> With percentiles the values are sorted in ascending order and hence for the 
> transfer rate p90 gives us the value where 90 percent rates are lower 
> (worse), p99 gives us the value where 99 percent values are lower (worse).
> Note that value(p90) < p(99) thus p99 is a better transfer rate as compared 
> to p90.
> However as the percentile increases the value should become worse in order to 
> know how good our system is.
> Hence instead of calculating the data read transfer rate, we should calculate 
> it's inverse. We will instead calculate the time taken for a GB of data to be 
> read. ( seconds / GB )
> After this the p90 value will give us 90 percentage of total values where the 
> time taken is less than value(p90), similarly for p99 and others.
> Also p(90) < p(99) and here p(99) will become a worse value (taking more time 
> each byte) as compared to p(90)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16949) Update ReadTransferRate to ReadLatencyPerGB for effective percentile metrics

Reply via email to