On Fri, Jan 30, 2009 at 2:10 AM, Antony Blakey <[email protected]> wrote: > > On 30/01/2009, at 5:32 PM, Paul Davis wrote: > >> On Fri, Jan 30, 2009 at 1:58 AM, Antony Blakey <[email protected]> >> wrote: >>> >>> On 30/01/2009, at 4:27 PM, Paul Davis wrote: >>> >>>> On Fri, Jan 30, 2009 at 12:32 AM, Antony Blakey >>>> <[email protected]> >>>> wrote: >>>>> >>>>> On 30/01/2009, at 9:56 AM, Paul Davis wrote: >>>>> >>>>>> The way that stats are calculated currently with the dependent >>>>>> variable being time could cause some issues in implementing more >>>>>> statistics. With my extremely limited knowledge of stats I think >>>>>> moving that to be dependent on the number of requests might be better. >>>>>> This is something that hopefully someone out there knows more about. >>>>>> (This is in terms of "avg for last 5 minutes" vs "avg for last 100 >>>>>> requests", (the later of the two making stddev type stats >>>>>> calculateable on the fly in constant memory.) >>>>> >>>>> The problem with using # of requests is that depending on your data, >>>>> each >>>>> request may take a long time. I have this problem at the moment: 1008 >>>>> documents in a 3.5G media database. During a compact, the status in >>>>> _active_tasks updates every 1000 documents, so you can imagine how >>>>> useful >>>>> that is :/ I thought it had hung (and neither the beam.smp CPU time nor >>>>> the >>>>> IO requests were a good indicator). I spent some time chasing this down >>>>> as a >>>>> bug before realising the problems was in the status granularity! >>>>> >>>> >>>> Actually I don't think that affects my question at all. It may change >>>> how we report things though. As in, it may be important to be able to >>>> report things that are not single increment/decrement conditions but >>>> instead allow for adding arbitrary floating point numbers to the >>>> number of recorded data points. >>> >>> I think I have the wrong end of the stick here - my problem was with the >>> granularity of updates, not with the basis of calculation. >>> >> >> Heh. Well, we can only measure what we know. And in the interest of >> simplicity I think the granularity is gonna have to stick to pretty >> much per request. Also you're flying with 300 MiB docs? perhaps its >> time to chop or store in FTP? > > No, lots of attachments per doc. I need them to replicate. 3.5G / 1000 docs > = roughly 3.5 MB attachments per doc. Not unreasonable. >
What an appropriate thread to have made a math error. Also, yes, not at all unreasonable. > Antony Blakey > -------------------------- > CTO, Linkuistics Pty Ltd > Ph: 0438 840 787 > > Plurality is not to be assumed without necessity > -- William of Ockham (ca. 1285-1349) > > >
