himanshug commented on issue #8071: add aggregators for computing mean/average
URL: 
https://github.com/apache/incubator-druid/issues/8071#issuecomment-514679521
 
 
   I ran following benchmark on an idle `t3.medium` EC2 instance.
   
   ```
   @State(Scope.Benchmark)
   @Fork(1)
   @BenchmarkMode(Mode.AverageTime)
   @OutputTimeUnit(TimeUnit.NANOSECONDS)
   @Warmup(iterations = 5)
   @Measurement(iterations = 5)
   public class MeanAggregationBenchmark
   {
     @Param({"10", "100", "1000", "10000", "100000", "1000000"})
     private int n;
   
     @Benchmark
     public void sumcountalgo(Blackhole blackhole)
     {
       double sum = 0;
       long count = 0;
   
       for (int i = 1; i <= n; i++) {
         count++;
         sum += i;
       }
   
       double mean = sum/count;
       blackhole.consume(mean);
     }
   
     @Benchmark
     public void divbasedalgo(Blackhole blackhole)
     {
       double mean = 0;
       long count = 0;
   
       for (int i = 1; i <= n; i++) {
         count++;
         mean = mean + (i - mean)/count;
       }
   
       blackhole.consume(mean);
     }
   }
   ```
   
   that produced
   
   ```
   Benchmark                                  (n)  Mode  Cnt        Score       
Error  Units
   MeanAggregationBenchmark.divbasedalgo       10  avgt    5       30.601 ±     
0.123  ns/op
   MeanAggregationBenchmark.divbasedalgo      100  avgt    5      613.996 ±     
4.428  ns/op
   MeanAggregationBenchmark.divbasedalgo     1000  avgt    5     7130.145 ±    
90.105  ns/op
   MeanAggregationBenchmark.divbasedalgo    10000  avgt    5    71296.475 ±   
627.211  ns/op
   MeanAggregationBenchmark.divbasedalgo   100000  avgt    5   712208.125 ±  
5066.485  ns/op
   MeanAggregationBenchmark.divbasedalgo  1000000  avgt    5  7117593.058 ± 
78224.014  ns/op
   MeanAggregationBenchmark.sumcountalgo       10  avgt    5        8.600 ±     
0.106  ns/op
   MeanAggregationBenchmark.sumcountalgo      100  avgt    5       99.377 ±     
1.028  ns/op
   MeanAggregationBenchmark.sumcountalgo     1000  avgt    5     1282.228 ±    
29.673  ns/op
   MeanAggregationBenchmark.sumcountalgo    10000  avgt    5    12790.421 ±   
127.669  ns/op
   MeanAggregationBenchmark.sumcountalgo   100000  avgt    5   128994.367 ±  
2068.910  ns/op
   MeanAggregationBenchmark.sumcountalgo  1000000  avgt    5  1281204.111 ± 
12692.807  ns/op
   ```
   
   From that it is clear that "div based" algo is about 6 times slower compared 
to "sum based"  . With div based algo, it is about ~7ms for 1mn aggregations 
compared to ~1ms for sum based , and that might be make a difference for some 
users but not most.
   I think, for the mean aggregator introduced here, we can have both algos be 
present in the code with `div based` being the default but ability to switch to 
`sum based` if need be.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to