pitrou commented on a change in pull request #9435:
URL: https://github.com/apache/arrow/pull/9435#discussion_r572042030



##########
File path: cpp/src/arrow/compute/api_aggregate.h
##########
@@ -105,12 +105,15 @@ struct ARROW_EXPORT VarianceOptions : public 
FunctionOptions {
 /// By default, returns the median value.
 struct ARROW_EXPORT QuantileOptions : public FunctionOptions {
   /// Interpolation method to use when quantile lies between two data points
+  /// TDIGEST is useful to approximate quantiles from large volume inputs.
+  /// It has constant memory footprint, but lower accuracy.
   enum Interpolation {
     LINEAR = 0,
     LOWER,
     HIGHER,
     NEAREST,
     MIDPOINT,
+    TDIGEST,

Review comment:
       I'm curious whether this is best exposed as an interpolation kind for 
the "quantile" function, or a separate function altogether. Are there 
precedents in other libraries or database engines?
   
   It seems R uses a separate function:
   https://www.rdocumentation.org/packages/tdigest/versions/0.3.0/topics/tdigest
   
   cc @nealrichardson @michalursa  for opinion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to