nickva opened a new pull request, #4672: URL: https://github.com/apache/couchdb/pull/4672
Folsom histograms are a major bottleneck under high concurrency, as described in #4650. This was noticed during performance testing, confirmed using Erlang VM lock counting, then verified by creating a test release with histogram update logic commented out [1]. CouchDB doesn't use most of the Folsom statistics and metrics; we only use counters, gauges and one type of sliding window, sampling histogram. Instead of trying to re-design and update Folsom, which is a generic stats and metrics library, take a simpler approach and create just the three metrics we need, and then remove Folsom and Bear dependencies altogether. All the metrics types we re-implement are based on two relatively new Erlang/OTP features: counters [2] and persistent terms [3]. Counters are mutable arrays of integers, which allow fast concurrent updates, and persistent terms allow fast, global, constant time access to Erlang terms. Gauges and counters are implemented as counter arrays with one element. Histograms are represented as counter arrays where each array element is a histogram bin. Since we're dealing with sliding time window histograms, we have a tuple of counter arrays, where each time instant (each second) is a counter array. The overall histogram object then looks something like: ``` Histogram = { 1 = [1, 2, ..., ?BIN_COUNT] 2 = [1, 2, ..., ?BIN_COUNT] ... TimeWindow = [1, 2, ..., ?BIN_COUNT] } ``` To keep the structure immutable we need to set a limit on both the number of bins and the time window size. To limit the number of bins we need to set some minimum and maximum value limits. Since almost all our histograms record access times in milliseconds, we pick a range from 10 microseconds up to over one hour. Histogram bin widths are increasing exponentially in order to keep a reasonable precision across the whole range of values. This encoding is similar to how floating point numbers work. Additional details on how this works are described in the the `couch_stats_histogram.erl` module. To keep the histogram object structure immutable, the time window is used in a circular fashion. The time parameter to the histogram update/3 function is the monotonic clock time, and the histogram time index is computed as `Time rem TimeWindow`. So, as the monotonic time is advancing forward, the histogram time index will loop around. This comes with a minor annoyance of having to allocate a larger time window to accommodate some process which cleans stale (expired) histogram entries, possibly with some extra buffers to ensure the currently updated interval and the interval ready to be cleaned would not overlap. This periodic cleanup is performed in the couch_stats_server process. Besides performance, the new histograms have two other improvement over the Folsom ones: - They record every single value. Previous histograms did sampling and recorded mostly just the first 1024 values during each time instant (second). - They are mergeable. Multiple histograms can be merged with corresponding bins summed together. This could allow cluster wide histogram summaries or gathering histograms from individual processes, then combining them at the end in a central process. Other performance improvement in this commit is eliminating the need to periodically flush or scrape stats in the background in both couch_stats and prometheus apps. Stats fetching from persistent terms and counters takes less than 5 milliseconds, and sliding time window histogram will always return the last 10 seconds of data no matter when the stats are queried. Now that will be done only when the stats are actually queried. Since the Folsom library was abstracted away behind a couch_stats API, the rest of the applications do not need to be updated. They still call couch_stats:update_histogram/2, couch_stats:increment_counter/1, etc. Previously couch_stats did not have any tests at all. Folsom and Bear had some tests, but I don't think we ever ran those test suites. To rectify the situation added tests to cover the functionality. All the newly added or updated modules should be have near or exactly 100% test coverage. [1] https://github.com/apache/couchdb/issues/4650#issue-1764685693 [2] https://www.erlang.org/doc/man/counters.html [3] https://www.erlang.org/doc/man/persistent_term.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org