[
https://issues.apache.org/jira/browse/ARROW-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Li updated ARROW-11727:
-----------------------------
Component/s: C++
> [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
> -----------------------------------------------------------------------
>
> Key: ARROW-11727
> URL: https://issues.apache.org/jira/browse/ARROW-11727
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, FlightRPC
> Reporter: Yibo Cai
> Assignee: Yibo Cai
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> In Flight benchmark, boost accumulator is used to estimate latency quantiles
> (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square
> is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
> Test result shows 0.99 latency is much better than what current code tells
> us. We should switch to TDigest.
> - run flight-benchmark with default parameters
> - calculate 0.99 quantile of latencies
> - compare exact value (store all data points), value from tdigest, and value
> from boost
> - test 5 rounds
> {noformat}
> Exact Tdigest Boost-P2
> 86 93 2130
> 175 235 1526
> 151 165 1926
> 147 153 302
> 251 313 561
> {noformat}
> TDigest gives more accurate values for all quantiles. For 0.5 quantiles, both
> TDigest and Boost gives very accurate result. For 0.95 quantiles, TDigest
> gives almost exact value, Boost has a bit deviation.
> [1] [https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)