NicoK opened a new pull request #8887: [FLINK-12982][metrics] improve 
DescriptiveStatisticsHistogramStatistics performance
URL: https://github.com/apache/flink/pull/8887
 
 
   ## What is the purpose of the change
   
   Instead of redirecting `DescriptiveStatisticsHistogramStatistics` calls to 
`DescriptiveStatistics`, it now takes a point-in-time snapshot using an own 
`UnivariateStatistic` implementation that
   1. calculates min, max, mean, and standard deviation in one go (as opposed 
to four iterations over the values array!)
   2. caches pivots for the percentile calculation to speed up retrieval of 
multiple percentiles/quartiles
   
   As a result, this roughly increases value retrieval performance by 120% when 
accessing typical statistics in a metrics reporter, e.g. the InfluxDB reporter: 
count, min, max, mean, stddev, p50, p75, p95, p98, p99, p999.
   
   Performance results:
   1. only adding values to the histogram
   ```
       Benchmark                                       Mode  Cnt      Score     
   Error   Units
       HistogramBenchmarks.dropwizardHistogramAdd     thrpt   30   47985.359 ±  
  25.847  ops/ms
       HistogramBenchmarks.descriptiveHistogramAdd    thrpt   30   70158.792 ±  
 276.858  ops/ms
   -> this PR:
       HistogramBenchmarks.descriptiveHistogramAdd    thrpt   30   75303.040 ±  
 475.355  ops/ms
   ```
   2. after adding each value, also retrieving a common set of metrics:
   ```
       Benchmark                                       Mode  Cnt      Score     
   Error   Units
       HistogramBenchmarks.dropwizardHistogram        thrpt   30     400.274 ±  
   4.930  ops/ms
       HistogramBenchmarks.descriptiveHistogram       thrpt   30     124.533 ±  
   1.060  ops/ms
   -> this PR:
       HistogramBenchmarks.descriptiveHistogram       thrpt   30     251.895 ±  
   1.809  ops/ms
   ```
   
   Please note that this PR is based on #8884.
   
   ## Brief change log
   
   - create a custom `CommonMetricsSnapshot implements UnivariateStatistic`
   -- calculates min, max, mean, and stddev in one go
   -- caches values inside a percentile implementation for optimised retrieval 
of multiple percentiles 
   - create a point-in-time histogram snapshot during creation of 
`DescriptiveStatisticsHistogramStatistics`
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as derivatives of 
`AbstractHistogramTest`.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **no**
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
     - The serializers: **no**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **no**
     - If yes, how is the feature documented? **JavaDocs**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to