Re: Statistics Module

Paul Davis Thu, 29 Jan 2009 23:21:54 -0800

On Fri, Jan 30, 2009 at 2:10 AM, Antony Blakey <[email protected]> wrote:
>
> On 30/01/2009, at 5:32 PM, Paul Davis wrote:
>
>> On Fri, Jan 30, 2009 at 1:58 AM, Antony Blakey <[email protected]>
>> wrote:
>>>
>>> On 30/01/2009, at 4:27 PM, Paul Davis wrote:
>>>
>>>> On Fri, Jan 30, 2009 at 12:32 AM, Antony Blakey
>>>> <[email protected]>
>>>> wrote:
>>>>>
>>>>> On 30/01/2009, at 9:56 AM, Paul Davis wrote:
>>>>>
>>>>>> The way that stats are calculated currently with the dependent
>>>>>> variable being time could cause some issues in implementing more
>>>>>> statistics. With my extremely limited knowledge of stats I think
>>>>>> moving that to be dependent on the number of requests might be better.
>>>>>> This is something that hopefully someone out there knows more about.
>>>>>> (This is in terms of "avg for last 5 minutes" vs "avg for last 100
>>>>>> requests", (the later of the two making stddev type stats
>>>>>> calculateable on the fly in constant memory.)
>>>>>
>>>>> The problem with using # of requests is that depending on your data,
>>>>> each
>>>>> request may take a long time. I have this problem at the moment: 1008
>>>>> documents in a 3.5G media database. During a compact, the status in
>>>>> _active_tasks updates every 1000 documents, so you can imagine how
>>>>> useful
>>>>> that is :/ I thought it had hung (and neither the beam.smp CPU time nor
>>>>> the
>>>>> IO requests were a good indicator). I spent some time chasing this down
>>>>> as a
>>>>> bug before realising the problems was in the status granularity!
>>>>>
>>>>
>>>> Actually I don't think that affects my question at all. It may change
>>>> how we report things though. As in, it may be important to be able to
>>>> report things that are not single increment/decrement conditions but
>>>> instead allow for adding arbitrary floating point numbers to the
>>>> number of recorded data points.
>>>
>>> I think I have the wrong end of the stick here - my problem was with the
>>> granularity of updates, not with the basis of calculation.
>>>
>>
>> Heh. Well, we can only measure what we know. And in the interest of
>> simplicity I think the granularity is gonna have to stick to pretty
>> much per request. Also you're flying with 300 MiB docs? perhaps its
>> time to chop or store in FTP?
>
> No, lots of attachments per doc. I need them to replicate. 3.5G / 1000 docs
> = roughly 3.5 MB attachments per doc. Not unreasonable.
>


What an appropriate thread to have made a math error. Also, yes, not
at all unreasonable.

> Antony Blakey
> --------------------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> Plurality is not to be assumed without necessity
>  -- William of Ockham (ca. 1285-1349)
>
>
>

Re: Statistics Module

Reply via email to