cyb70289 commented on pull request #9435: URL: https://github.com/apache/arrow/pull/9435#issuecomment-783839807
> Also, it seems there's a performance problem here where TDigest is much slower than Quantile: Thanks for careful review. As you found, the bottleneck of TDigest is sorting. I once talked with the author about possibility of improving TDigest performance by removing sorting: https://github.com/tdunning/t-digest/issues/153. There are some ideas to mitigate the penalty of sorting. I also plan to evaluate faster sorting algorithm like radix sort for floating numbers (folly has such an implementation). `Quantile` kernel is expensive in calculating quantiles (call `std::nth_element`). So centils() performances the worst, simply because it calculates more quantiles. `TDigest` kernel is trivial in calculating quantiles, so its performance is almost constant regardless number of quantiles to calculate. But TDigest is expensive in adding data points (TDigestMerger::Add), when buffer is full, it sorts (call `std::sort`) the input buffer which is much slower than `std::nth_element`. I think `Quantile` and `TDigest` are for different purpose. You will use `TDigest` if you have large volume of inputs (or streaming data), and copying all of them leads to huge memory footprint. Another benefit of `TDigest` is that chunks can be processed in parallel, then merge into one tdigest, though current aggregate kernel framework doesn't support it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org