nickva opened a new pull request, #4672:
URL: https://github.com/apache/couchdb/pull/4672

   Folsom histograms are a major bottleneck under high concurrency, as 
described in #4650. This was noticed during performance testing, confirmed 
using Erlang VM lock counting, then verified by creating a test release with 
histogram update logic commented out [1].
   
   CouchDB doesn't use most of the Folsom statistics and metrics; we only use 
counters, gauges and one type of sliding window, sampling histogram. Instead of 
trying to re-design and update Folsom, which is a generic stats and metrics 
library, take a simpler approach and create just the three metrics we need, and 
then remove Folsom and Bear dependencies altogether.
   
   All the metrics types we re-implement are based on two relatively new 
Erlang/OTP features: counters [2] and persistent terms [3]. Counters are 
mutable arrays of integers, which allow fast concurrent updates, and persistent 
terms allow fast, global, constant time access to Erlang terms.
   
   Gauges and counters are implemented as counter arrays with one element. 
Histograms are represented as counter arrays where each array element is a 
histogram bin. Since we're dealing with sliding time window histograms, we have 
a tuple of counter arrays, where each time instant (each second) is a counter 
array. The overall histogram object then looks something like:
   
   ```
   Histogram = {
        1          = [1, 2, ..., ?BIN_COUNT]
        2          = [1, 2, ..., ?BIN_COUNT]
        ...
        TimeWindow = [1, 2, ..., ?BIN_COUNT]
     }
   ```
   
   To keep the structure immutable we need to set a limit on both the number of 
bins and the time window size. To limit the number of bins we need to set some 
minimum and maximum value limits. Since almost all our histograms record access 
times in milliseconds, we pick a range from 10 microseconds up to over one 
hour. Histogram bin widths are increasing exponentially in order to keep a 
reasonable precision across the whole range of values. This encoding is similar 
to how floating point numbers work. Additional details on how this works are 
described in the the `couch_stats_histogram.erl` module.
   
   To keep the histogram object structure immutable, the time window is used in 
a circular fashion. The time parameter to the histogram update/3 function is 
the monotonic clock time, and the histogram time index is computed as `Time rem 
TimeWindow`. So, as the monotonic time is advancing forward, the histogram time 
index will loop around. This comes with a minor annoyance of having to allocate 
a larger time window to accommodate some process which cleans stale (expired) 
histogram entries, possibly with some extra buffers to ensure the currently 
updated interval and the interval ready to be cleaned would not overlap. This 
periodic cleanup is performed in the couch_stats_server process.
   
   Besides performance, the new histograms have two other improvement over the 
Folsom ones:
   
     - They record every single value. Previous histograms did sampling and 
recorded mostly just the first 1024 values during each time instant (second).
   
     - They are mergeable. Multiple histograms can be merged with corresponding 
bins summed together. This could allow cluster wide histogram summaries or 
gathering histograms from individual processes, then combining them at the end 
in a central process.
   
   Other performance improvement in this commit is eliminating the need to 
periodically flush or scrape stats in the background in both couch_stats and 
prometheus apps. Stats fetching from persistent terms and counters takes less 
than 5 milliseconds, and sliding time window histogram will always return the 
last 10 seconds of data no matter when the stats are queried. Now that will be 
done only when the stats are actually queried.
   
   Since the Folsom library was abstracted away behind a couch_stats API, the 
rest of the applications do not need to be updated. They still call 
couch_stats:update_histogram/2, couch_stats:increment_counter/1, etc.
   
   Previously couch_stats did not have any tests at all. Folsom and Bear had 
some tests, but I don't think we ever ran those test suites. To rectify the 
situation added tests to cover the functionality. All the newly added or 
updated modules should be have near or exactly 100% test coverage.
   
   [1] https://github.com/apache/couchdb/issues/4650#issue-1764685693
   [2] https://www.erlang.org/doc/man/counters.html
   [3] https://www.erlang.org/doc/man/persistent_term.html
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to