himanshug opened a new issue #8071: add aggregators for computing mean/average
URL: https://github.com/apache/incubator-druid/issues/8071
 
 
   ### Motivation
   
   I have use case of querying mean value of certain columns and need an easy 
and efficient way to do same.
   
   ### Proposed changes
   
   We would introduce following `DoubleMeanAggregatorFactory` implementation 
and other related classes e.g. `DoubleMeanAggregator` . It would work by using 
following well known algorithm.
   
   ```
   // maintain following variables
   long count;
   double mean;
   
   // update with a value v
   count++;
   mean = mean + (v - mean)/count;
   
   // merging
   count = count1 + count2;
   mean = (mean1*count1 + mean2*count2)/count;
   ```
   
   consequently a new aggregator type called `doubleMean` would be made 
available.
   
   ### Rationale
   In comparison to the alternatives, proposed implementation is most 
straightforward and least overhead way to get mean of a column.
   
   Alternative#1:
   Use `doubleSum` , `doubleCount` aggregators and use `arithmetic` or 
`expression` post aggregator to do the division to compute mean. It becomes 
tedious for system generating the druid query and mean is a very common 
aggregation being available out of the box. 
   
   Alternative#2:
   Add a `MeanPostAggregator` that extracts the mean from 
`VarianceAggregatorCollector` maintained by `VarianceAggregator` OR  add a 
option to `VarianceAggregatorFactory` to 
    output mean in `finalizeComputation(obj)` method. However, this would 
unnecessarily maintain variables for variance.
   
   ### Operational impact
   None
   
   ### Test plan (optional)
   Changes proposed here would be easily unit testable.
   
   ### Future work (optional)
   
   Maybe a `floatMean` aggregator if some use case is too paranoid about saving 
few bytes at query time. However, I don't think mean is a stat that should be 
indexed and stored in segment so `floatMean` is not important I think.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to