Hi Willy, Thanks for your detailed mail. I will get back to you very soon on this.
Regards, - Krishna On Tue, Jan 31, 2017 at 12:54 AM, Willy Tarreau <w...@1wt.eu> wrote: > Hi Krishna, > > back on earth ;-) > > On Tue, Jan 03, 2017 at 03:07:26PM +0530, Krishna Kumar (Engineering) > wrote: > > I explored your suggestion of "hard-coded periods", and have some > > problems: code complexity seems to be very high at updates (as well > > as retrievals possibly); and I may not be able to get accurate results. > > E.g. I have data for 1, 4, 16 seconds; and at 18 seconds, a request is > > made for retrieval of the last 16 seconds (or 1,4,16). At this time I > have > > values for last 18 seconds not 16 seconds. I explored using timers to > > cascade (will not work as it may run into races with the setters, and > > also adds too much overhead) vs doing this synchronously when the > > event happens. Both are complicated and have the above issue of not > > able to get accurate information depending on when the request is > > made. > > The principle is quite simple but requires a small explanation of how our > frequency counters manage to report such accurate values first. > > We're counting events that happen over a period, represented by a cross > over the time line below : > > X X X X X X X X X X X > |--------------------------------------------------------> t > > In order to be able to measure an approximately accurate frequency without > storing too much data, we'll report an average over a period. The problem > is, if we only report a past average, we'll get slowly changing data and > in particular it would not be usable to detect quick changes such as a high > rate happening over the last second. Thus we always keep two measures, the > one of the previous period, and the one of the current period. > > X X X X X X X X X X X > |-----------------------------|--------------------------> t > <--- previous ---> | <--- current ---> > count=6 count=5 > > And using this we implement a pseudo sliding window. With a true sliding > window, you would count events between two points that are always exactly > the same distance apart, which requires to memorize all of them. With > this pseudo sliding window, instead, we make use of the fact that events > follow an approximately homogenous distribution over each period and have > approximately the same frequency over both periods. Thus the mechanism > becomes more of a "fading" window in fact : we consider one part of the > previous window that we sum with the current one. The closer the current > date is from the end of the current window, the least we count on the > previous window. See below, I've represented 5 different sliding windows > with the ratio of the previous window that we consider as still valid > marked with stars and the ratio of the old period at the end of the line : > > X X X X X X X X X X X > |-----------------------------|----------------------------> t > |***********************------| 80% > |*****************------------| 60% > |***********------------------| 40% > |*****------------------------| 20% > |-----------------------------| 0% > > Thus the measure exactly is current + (1-relative_position) * previous. > Once the period is over, the "current" value overwrites the "previous" > one, and is cleared again, waiting for new events. This is done during > insertion and possibly during value retrieval if it's noticed that the > period has switched since last time. > > > In fact this mechanism can be repeated over multiple periods if needed, > the principle must always be the same : > > X X X X X X X X X X X > |------------|------------|------------|------------|----------> t > old N-3 N-2 N-1 current > > > The "old" period is the one on which we apply the fading effect. The > current period is the one being entirely accounted (since it's being > filled as we're looking at it) so here you could always keep a few > past values (rotating copies of "current") and keep the last one for > the fade out. In the simplified case above we just don't have the > N-1..N-M, we only have "current" and "old". > > Thus as you can see here, the "current" period is the only dynamic one, > which is the exact sum of events over the last period. But now what if > we wanted to implement this with multiple levels ? It's very easy, the > "current" period being the sum of events, it simply is the sum of the > lower layers when you cascade multiple counters, thus it's the lower > layer's sum of [current] + [old*ratio] + [n-1] + [n-2] +...+ [n-m]. > > Thus let's have an example with periods of 1s, 2s, 4s and 8s : > > previous current > [8s] |----------------------------|------------------------------> t > [4s] |---------------|--------------> > [2s] |-------|------> > [1s] |---|--> > > I guess you're seeing a pattern : each layer's "current" value is useless > as it's derived from the lower layers values. So you don't even have > to implement the current part (except for the smallest precision layer), > you just need each layer's previous value. If we call "Pn" the previous > value for layer (n)s, "Rn" the ratio applied to this due to the current > time position within the current period of (n) seconds, "Cn" the current > value for layer "n", then you quickly see that : > > > N - 1 > ___ > \ > Cn = > Pi + C0 > <__ > i = 1 > > Avg_rate(n) = Pn * Rn + Cn > > > What this implies is that by storing only 8 counters, you can have an > history of 1, 2, 4, 8, 16, 32 and 64s which still moves as fast as the > quotient represented by the movements of the measurement period. > > And better, if you want different periods, you can apply the mechanism > above to store more periods. For example we could imagine that you store : > > 60s : old, N-3, N-2, N-1 > 15s : old, N-2, N-1 > 5s : old, N-4, N-3, N-2, N-1 > 1s : old, cur > > So the 60s avg can be computed as [N60-3]+[N60-2]+[N60-1]+old*ratio + > lower layers. > > Updates are not complicated at all, however they can propagate along a > long chain, but normally this chain should not be too long, and the longest > the chain, the rarest they apply (the frequency of a chain of N nodes is > 1/2^N), so for example in the example with 64s above you would only update > 7 levels once per minute, which is ridiculously cheap. > > The variable sizes above (eg: 1s, 5s, 15s, 60s) are appealing for the end > user but more complicated to implement because you need to know the number > of buckets at each layer, so that makes the computation functions less > generic (or rely on more information that you have to store). Also, while > dealing with only two buckets per level is trivial (a simple copy), for > the other numbers, you need to implement some kind of a rotation, with the > current pointer determined by a modulus on the current time, which also is > less appealing. But if you go with powers of 4 like (1, 4, 16, 64), it's > simplifying quite a bit since you only have to store 4 buckets per layer > and the modulus is a simple binary AND on the time. > > > To implement your suggestion of say histograms, the retrieval code can > > calculate the 4 values (1, 4, 16, and 64 seconds) by averaging across > > the correct intervals. In this case, the new CLI command is not required, > > and by default it prints all 4 values. Would this work in your opinion? > > Yes that's exactly the point. If you have only a few values to display, > you can always display them. > > I hope I helped you see your way through the solution better and did not > confuse you too much :-) > > Best regards, > Willy > >