pitrou commented on a change in pull request #9435: URL: https://github.com/apache/arrow/pull/9435#discussion_r572042030
########## File path: cpp/src/arrow/compute/api_aggregate.h ########## @@ -105,12 +105,15 @@ struct ARROW_EXPORT VarianceOptions : public FunctionOptions { /// By default, returns the median value. struct ARROW_EXPORT QuantileOptions : public FunctionOptions { /// Interpolation method to use when quantile lies between two data points + /// TDIGEST is useful to approximate quantiles from large volume inputs. + /// It has constant memory footprint, but lower accuracy. enum Interpolation { LINEAR = 0, LOWER, HIGHER, NEAREST, MIDPOINT, + TDIGEST, Review comment: I'm curious whether this is best exposed as an interpolation kind for the "quantile" function, or a separate function altogether. Are there precedents in other libraries or database engines? It seems R uses a separate function: https://www.rdocumentation.org/packages/tdigest/versions/0.3.0/topics/tdigest cc @nealrichardson @michalursa for opinion. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org