[ 
https://issues.apache.org/jira/browse/ARROW-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibo Cai updated ARROW-11727:
-----------------------------
    Description: 
In Flight benchmark, boost accumulator is used to estimate latency quantiles 
(0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is 
very bad at estimating skewed quantiles like 0.99, where TDigest shines.

Test result shows 0.99 latency is much better than what current code tells us. 
We should switch to TDigest.
 - run flight-benchmark with default parameters
 - calculate 0.99 quantile of latencies
 - compare exact value (store all data points), value from tdigest, and value 
from boost
 - test 5 rounds
{noformat}
Exact Tdigest Boost-P2
86    93      2130
175   235     1526
151   165     1926
147   153     302
251   313     561
{noformat}

TDigest gives more accurate values for all quantiles. For 0.5 quantiles, both 
TDigest and Boost gives very accurate result. For 0.95 quantiles, TDigest gives 
almost exact value, Boost has a bit deviation.

[1] [https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf]

  was:
In Flight benchmark, boost accumulator is used to estimate latency quantiles 
(0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is 
very bad at estimating skewed quantiles like 0.99, where TDigest shines.

Test result shows 0.99 latency is much better than what current code tells us. 
We should switch to TDigest.

- run flight-benchmark with default parameters
- calculate 0.99 quantile of latencies
- compare exact value (store all data points), value from tdigest, and value 
from boost
- test 5 rounds
{noformat}
Exact Tdigest Boost-P2
86    93      2130
175   235     1526
151   165     1926
147   153     302
251   313     561
{noformat}

[1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf


> [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
> -----------------------------------------------------------------------
>
>                 Key: ARROW-11727
>                 URL: https://issues.apache.org/jira/browse/ARROW-11727
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: FlightRPC
>            Reporter: Yibo Cai
>            Assignee: Yibo Cai
>            Priority: Major
>
> In Flight benchmark, boost accumulator is used to estimate latency quantiles 
> (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square 
> is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
> Test result shows 0.99 latency is much better than what current code tells 
> us. We should switch to TDigest.
>  - run flight-benchmark with default parameters
>  - calculate 0.99 quantile of latencies
>  - compare exact value (store all data points), value from tdigest, and value 
> from boost
>  - test 5 rounds
> {noformat}
> Exact Tdigest Boost-P2
> 86    93      2130
> 175   235     1526
> 151   165     1926
> 147   153     302
> 251   313     561
> {noformat}
> TDigest gives more accurate values for all quantiles. For 0.5 quantiles, both 
> TDigest and Boost gives very accurate result. For 0.95 quantiles, TDigest 
> gives almost exact value, Boost has a bit deviation.
> [1] [https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to