On Sun, Jun 28, 2009 at 10:28 AM, Robert Dionne<[email protected]> wrote: > This seems like good improvement. I've only superficially reviewed the new > version, but have a couple of trivial and one non-trivial comments: > > 1. The ini file is growing. This might be a good candidate for it's own ini > file. It's somewhat orthogonal and could be nicely boxed out and ignored by > folks with no interest in such things. The downside is the complexity of > multiple files >
Definitely agree. Another thing I noticed is that the startup message with debug logging is also a bit longish. Noah's already got it setup to allow for multiple files using the default.d/ directories, but I wasn't quite certain on how to setup the build system > 2. Would it be helpful to be able to enable/disable stats completely. These > calculations must add some overhead. > That definitely seems reasonable though I'm not entirely certain how best to implement this. > 3. The use of moving averages is great, but as you comment there be quite a > lot of variability within a given time interval. Moving averages are > generally useful only over time, for example in making short term trading > decisions a moving average can help guess the direction of the next > reversion to a mean. In this scenario I would think peak usages would also > be of value. One could maintain min/max stats with respect to these moving > averages along with a time interval in order to identify hot spots. > Sounds reasonable. I'm not sure if min/max is more or less proper than quartiles. Or maybe just different? My stats-fu is less than stellar. > I'll have a closer look and write some tests > > > > > On Jun 27, 2009, at 9:32 PM, Paul Joseph Davis (JIRA) wrote: > >> Fixing weirdness in couch_stats_aggregator.erl >> ---------------------------------------------- >> >> Key: COUCHDB-396 >> URL: https://issues.apache.org/jira/browse/COUCHDB-396 >> Project: CouchDB >> Issue Type: Improvement >> Components: Database Core, HTTP Interface >> Affects Versions: 0.10 >> Environment: trunk >> Reporter: Paul Joseph Davis >> Assignee: Paul Joseph Davis >> Fix For: 0.10 >> Attachments: couchdb_stats_aggregator.patch >> >> Looking at adding unit tests to the couchdb_stats_aggregator module the >> other day I realized it was doing some odd calculations. This is a fairly >> non-trivial patch so I figured that I'd put in JIRA and get feed back before >> applying. This patch does everything the old version does afaict, but I'll >> be adding tests before I consider it complete. >> >> List of major changes: >> >> * The old behavior for stats was to integrate incoming values for a time >> period and then reset the values and start integrating again. That seemed a >> bit odd so I rewrote things to keep the average and standard deviation for >> the last N seconds with approximately 1 sample per second. >> * Changed request timing calculations [note below] >> * Sample periods are configurable in the .ini file. Sample periods of 0 >> are a special case and integrate all values from couchdb boot up. >> * Sample descriptions are in the configuration files now. >> * You can request different time periods for the root stats end point. >> * Added a sum to the list of statistics >> * Simplified some of the external API >> >> The biggest change is in how time for requests are calculated. AFAICT, the >> old way was accumulating request timings in the stats collector and just >> adding new values as clock ticks went by as everything else does which makes >> sense in the case of resetting counters every time period. In the new way >> I'm keeping a list of the samples in the last time period and when I get a >> clock tick part of the update is to remove the samples that have passed out >> of the time period. For a variable like request_time this would lead to >> unbounded storage. >> >> The new method is calculating the average time of all requests in a single >> clock tick (1s). One thing this loses is when you start having lots of >> variability in a single clock tick. Ie, your average request time is 100ms, >> but 10% of your requests are taking 500ms. I've read of people doing the >> averaging trick but also storing quantile information as well [1]. There are >> also algorithms for doing single pass quantile estimation and the like so >> its possible to do those things in O(N) time. The issue with quantiles is >> that it'd start breaking the logic of how the collector and aggregators are >> setup. As it is now, there's basically a one event -> one stat constraint. >> For the time being I went without quartiles to minimize the impact of the >> patch. >> >> This code will also be on github [3] as I add patches. >> >> >> [1] http://code.flickr.com/blog/2008/10/27/counting-timing/ >> [2] >> http://www.slamb.org/svn/repos/trunk/projects/loadtest/benchtools/stats.py (See >> the QuantileEstimator class) >> [3] http://github.com/davisp/couchdb/tree/stats-patch >> >> >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> > >
