nealrichardson commented on a change in pull request #9435:
URL: https://github.com/apache/arrow/pull/9435#discussion_r572322195
##########
File path: cpp/src/arrow/compute/api_aggregate.h
##########
@@ -105,12 +105,15 @@ struct ARROW_EXPORT VarianceOptions : public
FunctionOptions {
/// By default, returns the median value.
struct ARROW_EXPORT QuantileOptions : public FunctionOptions {
/// Interpolation method to use when quantile lies between two data points
+ /// TDIGEST is useful to approximate quantiles from large volume inputs.
+ /// It has constant memory footprint, but lower accuracy.
enum Interpolation {
LINEAR = 0,
LOWER,
HIGHER,
NEAREST,
MIDPOINT,
+ TDIGEST,
Review comment:
FWIW that tdigest R function is not part of base R, it is a contributed
package.
R's quantile function also supports multiple methods, 9 in fact, via the
`type` parameter:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
None of these are approximate in the same way as this, but it's a further
argument that it could be a function parameter rather than a separate function.
TBH I don't know how much it matters, as a compute API consumer I can make
either work. It's marginally easier if they're separate functions rather than
having to mess with `FunctionOptions` to select them--are maybe there are
downsides to that way?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]