On Sun, Jun 28, 2009 at 12:50 PM, Paul Davis<[email protected]> wrote: > On Sun, Jun 28, 2009 at 10:28 AM, Robert > Dionne<[email protected]> wrote: >> This seems like good improvement. I've only superficially reviewed the new >> version, but have a couple of trivial and one non-trivial comments: >> >> 1. The ini file is growing. This might be a good candidate for it's own ini >> file. It's somewhat orthogonal and could be nicely boxed out and ignored by >> folks with no interest in such things. The downside is the complexity of >> multiple files >> > > Definitely agree. Another thing I noticed is that the startup message > with debug logging is also a bit longish. Noah's already got it setup > to allow for multiple files using the default.d/ directories, but I > wasn't quite certain on how to setup the build system > >> 2. Would it be helpful to be able to enable/disable stats completely. These >> calculations must add some overhead. >> > > That definitely seems reasonable though I'm not entirely certain how > best to implement this. >
Before I forget, I did have the idea of making the resolution configurable to make stats updating happen some multiple of a second. The issue with that though is that I can't figure out how to make request timings work with ets. I don't see a way in the ets docs to make an update atomic when the update requires knowledge of the current value. >> 3. The use of moving averages is great, but as you comment there be quite a >> lot of variability within a given time interval. Moving averages are >> generally useful only over time, for example in making short term trading >> decisions a moving average can help guess the direction of the next >> reversion to a mean. In this scenario I would think peak usages would also >> be of value. One could maintain min/max stats with respect to these moving >> averages along with a time interval in order to identify hot spots. >> > > Sounds reasonable. I'm not sure if min/max is more or less proper than > quartiles. Or maybe just different? My stats-fu is less than stellar. > >> I'll have a closer look and write some tests >> >> >> >> >> On Jun 27, 2009, at 9:32 PM, Paul Joseph Davis (JIRA) wrote: >> >>> Fixing weirdness in couch_stats_aggregator.erl >>> ---------------------------------------------- >>> >>> Key: COUCHDB-396 >>> URL: https://issues.apache.org/jira/browse/COUCHDB-396 >>> Project: CouchDB >>> Issue Type: Improvement >>> Components: Database Core, HTTP Interface >>> Affects Versions: 0.10 >>> Environment: trunk >>> Reporter: Paul Joseph Davis >>> Assignee: Paul Joseph Davis >>> Fix For: 0.10 >>> Attachments: couchdb_stats_aggregator.patch >>> >>> Looking at adding unit tests to the couchdb_stats_aggregator module the >>> other day I realized it was doing some odd calculations. This is a fairly >>> non-trivial patch so I figured that I'd put in JIRA and get feed back before >>> applying. This patch does everything the old version does afaict, but I'll >>> be adding tests before I consider it complete. >>> >>> List of major changes: >>> >>> * The old behavior for stats was to integrate incoming values for a time >>> period and then reset the values and start integrating again. That seemed a >>> bit odd so I rewrote things to keep the average and standard deviation for >>> the last N seconds with approximately 1 sample per second. >>> * Changed request timing calculations [note below] >>> * Sample periods are configurable in the .ini file. Sample periods of 0 >>> are a special case and integrate all values from couchdb boot up. >>> * Sample descriptions are in the configuration files now. >>> * You can request different time periods for the root stats end point. >>> * Added a sum to the list of statistics >>> * Simplified some of the external API >>> >>> The biggest change is in how time for requests are calculated. AFAICT, the >>> old way was accumulating request timings in the stats collector and just >>> adding new values as clock ticks went by as everything else does which makes >>> sense in the case of resetting counters every time period. In the new way >>> I'm keeping a list of the samples in the last time period and when I get a >>> clock tick part of the update is to remove the samples that have passed out >>> of the time period. For a variable like request_time this would lead to >>> unbounded storage. >>> >>> The new method is calculating the average time of all requests in a single >>> clock tick (1s). One thing this loses is when you start having lots of >>> variability in a single clock tick. Ie, your average request time is 100ms, >>> but 10% of your requests are taking 500ms. I've read of people doing the >>> averaging trick but also storing quantile information as well [1]. There are >>> also algorithms for doing single pass quantile estimation and the like so >>> its possible to do those things in O(N) time. The issue with quantiles is >>> that it'd start breaking the logic of how the collector and aggregators are >>> setup. As it is now, there's basically a one event -> one stat constraint. >>> For the time being I went without quartiles to minimize the impact of the >>> patch. >>> >>> This code will also be on github [3] as I add patches. >>> >>> >>> [1] http://code.flickr.com/blog/2008/10/27/counting-timing/ >>> [2] >>> http://www.slamb.org/svn/repos/trunk/projects/loadtest/benchtools/stats.py (See >>> the QuantileEstimator class) >>> [3] http://github.com/davisp/couchdb/tree/stats-patch >>> >>> >>> >>> -- >>> This message is automatically generated by JIRA. >>> - >>> You can reply to this email to add a comment to the issue online. >>> >> >> >
