Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

Krishna Kumar (Engineering) Tue, 31 Jan 2017 19:21:21 -0800

Hi Willy,

Thanks for your detailed mail. I will get back to you very soon on this.


Regards,
- Krishna


On Tue, Jan 31, 2017 at 12:54 AM, Willy Tarreau <[email protected]> wrote:

> Hi Krishna,
>
> back on earth ;-)
>
> On Tue, Jan 03, 2017 at 03:07:26PM +0530, Krishna Kumar (Engineering)
> wrote:
> > I explored your suggestion of "hard-coded periods", and have some
> > problems: code complexity seems to be very high at updates (as well
> > as retrievals possibly); and I may not be able to get accurate results.
> > E.g. I have data for 1, 4, 16 seconds; and at 18 seconds, a request is
> > made for retrieval of the last 16 seconds (or 1,4,16). At this time I
> have
> > values for last 18 seconds not 16 seconds. I explored using timers to
> > cascade (will not work as it may run into races with the setters, and
> > also adds too much overhead) vs doing this synchronously when the
> > event happens. Both are complicated and have the above issue of not
> > able to get accurate information depending on when the request is
> > made.
>
> The principle is quite simple but requires a small explanation of how our
> frequency counters manage to report such accurate values first.
>
> We're counting events that happen over a period, represented by a cross
> over the time line below :
>
>     X     X  X        X  X    X  X     X     X     X   X
>   |--------------------------------------------------------> t
>
> In order to be able to measure an approximately accurate frequency without
> storing too much data, we'll report an average over a period. The problem
> is, if we only report a past average, we'll get slowly changing data and
> in particular it would not be usable to detect quick changes such as a high
> rate happening over the last second. Thus we always keep two measures, the
> one of the previous period, and the one of the current period.
>
>     X     X  X        X  X    X  X     X     X     X   X
>   |-----------------------------|--------------------------> t
>       <--- previous --->        |    <--- current --->
>            count=6                        count=5
>
> And using this we implement a pseudo sliding window. With a true sliding
> window, you would count events between two points that are always exactly
> the same distance apart, which requires to memorize all of them. With
> this pseudo sliding window, instead, we make use of the fact that events
> follow an approximately homogenous distribution over each period and have
> approximately the same frequency over both periods. Thus the mechanism
> becomes more of a "fading" window in fact : we consider one part of the
> previous window that we sum with the current one. The closer the current
> date is from the end of the current window, the least we count on the
> previous window. See below, I've represented 5 different sliding windows
> with the ratio of the previous window that we consider as still valid
> marked with stars and the ratio of the old period at the end of the line :
>
>     X     X  X        X  X    X  X     X     X     X   X
>   |-----------------------------|----------------------------> t
>         |***********************------|  80%
>               |*****************------------|  60%
>                     |***********------------------|  40%
>                           |*****------------------------|  20%
>                                 |-----------------------------|  0%
>
> Thus the measure exactly is current + (1-relative_position) * previous.
> Once the period is over, the "current" value overwrites the "previous"
> one, and is cleared again, waiting for new events. This is done during
> insertion and possibly during value retrieval if it's noticed that the
> period has switched since last time.
>
>
> In fact this mechanism can be repeated over multiple periods if needed,
> the principle must always be the same :
>
>     X     X  X        X  X    X  X     X     X     X   X
>   |------------|------------|------------|------------|----------> t
>       old          N-3         N-2           N-1        current
>
>
> The "old" period is the one on which we apply the fading effect. The
> current period is the one being entirely accounted (since it's being
> filled as we're looking at it) so here you could always keep a few
> past values (rotating copies of "current") and keep the last one for
> the fade out. In the simplified case above we just don't have the
> N-1..N-M, we only have "current" and "old".
>
> Thus as you can see here, the "current" period is the only dynamic one,
> which is the exact sum of events over the last period. But now what if
> we wanted to implement this with multiple levels ? It's very easy, the
> "current" period being the sum of events, it simply is the sum of the
> lower layers when you cascade multiple counters, thus it's the lower
> layer's sum of [current] + [old*ratio] + [n-1] + [n-2] +...+ [n-m].
>
> Thus let's have an example with periods of 1s, 2s, 4s and 8s :
>
>               previous                         current
>  [8s] |----------------------------|------------------------------> t
>  [4s]                              |---------------|-------------->
>  [2s]                                              |-------|------>
>  [1s]                                                      |---|-->
>
> I guess you're seeing a pattern : each layer's "current" value is useless
> as it's derived from the lower layers values. So you don't even have
> to implement the current part (except for the smallest precision layer),
> you just need each layer's previous value. If we call "Pn" the previous
> value for layer (n)s, "Rn" the ratio applied to this due to the current
> time position within the current period of (n) seconds, "Cn" the current
> value for layer "n", then you quickly see that :
>
>
>        N - 1
>         ___
>         \
>    Cn =  > Pi + C0
>         <__
>        i = 1
>
>    Avg_rate(n) = Pn * Rn + Cn
>
>
> What this implies is that by storing only 8 counters, you can have an
> history of 1, 2, 4, 8, 16, 32 and 64s which still moves as fast as the
> quotient represented by the movements of the measurement period.
>
> And better, if you want different periods, you can apply the mechanism
> above to store more periods. For example we could imagine that you store :
>
>    60s :  old, N-3, N-2, N-1
>    15s :  old, N-2, N-1
>     5s :  old, N-4, N-3, N-2, N-1
>     1s :  old, cur
>
> So the 60s avg can be computed as [N60-3]+[N60-2]+[N60-1]+old*ratio +
> lower layers.
>
> Updates are not complicated at all, however they can propagate along a
> long chain, but normally this chain should not be too long, and the longest
> the chain, the rarest they apply (the frequency of a chain of N nodes is
> 1/2^N), so for example in the example with 64s above you would only update
> 7 levels once per minute, which is ridiculously cheap.
>
> The variable sizes above (eg: 1s, 5s, 15s, 60s) are appealing for the end
> user but more complicated to implement because you need to know the number
> of buckets at each layer, so that makes the computation functions less
> generic (or rely on more information that you have to store). Also, while
> dealing with only two buckets per level is trivial (a simple copy), for
> the other numbers, you need to implement some kind of a rotation, with the
> current pointer determined by a modulus on the current time, which also is
> less appealing. But if you go with powers of 4 like (1, 4, 16, 64), it's
> simplifying quite a bit since you only have to store 4 buckets per layer
> and the modulus is a simple binary AND on the time.
>
> > To implement your suggestion of say histograms, the retrieval code can
> > calculate the 4 values (1, 4, 16, and 64 seconds) by averaging across
> > the correct intervals. In this case, the new CLI command is not required,
> > and by default it prints all 4 values. Would this work in your opinion?
>
> Yes that's exactly the point. If you have only a few values to display,
> you can always display them.
>
> I hope I helped you see your way through the solution better and did not
> confuse you too much :-)
>
> Best regards,
> Willy
>
>

Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

Reply via email to