This seems like good improvement. I've only superficially reviewed the
new version, but have a couple of trivial and one non-trivial comments:
1. The ini file is growing. This might be a good candidate for it's
own ini file. It's somewhat orthogonal and could be nicely boxed out
and ignored by folks with no interest in such things. The downside is
the complexity of multiple files
2. Would it be helpful to be able to enable/disable stats completely.
These calculations must add some overhead.
3. The use of moving averages is great, but as you comment there be
quite a lot of variability within a given time interval. Moving
averages are generally useful only over time, for example in making
short term trading decisions a moving average can help guess the
direction of the next reversion to a mean. In this scenario I would
think peak usages would also be of value. One could maintain min/max
stats with respect to these moving averages along with a time interval
in order to identify hot spots.
I'll have a closer look and write some tests
On Jun 27, 2009, at 9:32 PM, Paul Joseph Davis (JIRA) wrote:
Fixing weirdness in couch_stats_aggregator.erl
----------------------------------------------
Key: COUCHDB-396
URL: https://issues.apache.org/jira/browse/COUCHDB-396
Project: CouchDB
Issue Type: Improvement
Components: Database Core, HTTP Interface
Affects Versions: 0.10
Environment: trunk
Reporter: Paul Joseph Davis
Assignee: Paul Joseph Davis
Fix For: 0.10
Attachments: couchdb_stats_aggregator.patch
Looking at adding unit tests to the couchdb_stats_aggregator module
the other day I realized it was doing some odd calculations. This is
a fairly non-trivial patch so I figured that I'd put in JIRA and get
feed back before applying. This patch does everything the old
version does afaict, but I'll be adding tests before I consider it
complete.
List of major changes:
* The old behavior for stats was to integrate incoming values for a
time period and then reset the values and start integrating again.
That seemed a bit odd so I rewrote things to keep the average and
standard deviation for the last N seconds with approximately 1
sample per second.
* Changed request timing calculations [note below]
* Sample periods are configurable in the .ini file. Sample periods
of 0 are a special case and integrate all values from couchdb boot up.
* Sample descriptions are in the configuration files now.
* You can request different time periods for the root stats end point.
* Added a sum to the list of statistics
* Simplified some of the external API
The biggest change is in how time for requests are calculated.
AFAICT, the old way was accumulating request timings in the stats
collector and just adding new values as clock ticks went by as
everything else does which makes sense in the case of resetting
counters every time period. In the new way I'm keeping a list of the
samples in the last time period and when I get a clock tick part of
the update is to remove the samples that have passed out of the time
period. For a variable like request_time this would lead to
unbounded storage.
The new method is calculating the average time of all requests in a
single clock tick (1s). One thing this loses is when you start
having lots of variability in a single clock tick. Ie, your average
request time is 100ms, but 10% of your requests are taking 500ms.
I've read of people doing the averaging trick but also storing
quantile information as well [1]. There are also algorithms for
doing single pass quantile estimation and the like so its possible
to do those things in O(N) time. The issue with quantiles is that
it'd start breaking the logic of how the collector and aggregators
are setup. As it is now, there's basically a one event -> one stat
constraint. For the time being I went without quartiles to minimize
the impact of the patch.
This code will also be on github [3] as I add patches.
[1] http://code.flickr.com/blog/2008/10/27/counting-timing/
[2] http://www.slamb.org/svn/repos/trunk/projects/loadtest/benchtools/stats.py
(See the QuantileEstimator class)
[3] http://github.com/davisp/couchdb/tree/stats-patch
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.