Re: [External] Re: [RFC PATCH] Improved moving averages

Marcin Deranek Thu, 14 May 2020 04:25:25 -0700

Hi Willy,

On Wed, May 13, 2020 at 11:25 AM Willy Tarreau <[email protected]> wrote:


> Hi Marcin!
>
> On Tue, May 12, 2020 at 12:17:19PM +0200, Marcin Deranek wrote:
> > Hi,
> >
> > Not a long ago while investigating some issues with one of the services I
> > stumbled across Connect Time (based on ctime metric) graphs (see
> > ctime-problem.png). It turned out that metrics were nowhere near reality
> -
> > they were trying to reach real average value, but never got there as each
> > reload reset it back to 0. Keep in mind this was happening on a
> > multi-process HAProxy setup for a service with relatively low/medium
> amount
> > of traffic. I did a bit of investigation, tested different scenarios /
> > algorithms, selected the most optimal one and deployed it on one of our
> > load balancers (see low-traffic-compare.png and
> > medium-traffic-compare.png). ctime represents current algorithm and
> ctime2
> > represents new algorithm for calculating moving averages. As you can see,
> > differences can be dramatic over long periods of time.
>
> The improvements are indeed really nice!
>

Glad to hear that!


> > Drops of ctime
> > metric represent reloads of an HAProxy instance. Spikes of ctime2 are due
> > to http-reuse option - after reloading a new instance cannot reuse
> existing
> > connections, so it has to establish new connections, so timing goes up
> > (this is expected).
> > See a proposed patch. Few comments about the patch:
> > - The patch changes behaviour of moving average metrics generation (eg.
> > ctime, ttime) in a way that metric is not generated if there is no data.
> > Currently metrics start with 0 and it's not possible to distinguish if
> > latency is 0 (or close to 0) or there is no data. You can always check
> > req_tot or lbconn (depending on mode), but that makes things much more
> > complicated thus I decided to only expose those metrics if there is data
> > (at least 1 request has been made). Gaps on low-traffic-compare.png graph
> > indicate that during that period there were no requests and thus we
> return
> > no data.
>
> In fact it's a different metric (though very useful). I've had the same
> needs recently. The current ctime reports the avg time experienced by the
> last 1024 *requests* and is documented as such, so when you want to think
> in terms of user experience it's the one to consider. For example, if you
> have a 99% reuse rate and 1% connect rate, even a DC far away at 100ms
> will only add 1ms on average to the request time, because 99% of the time,
> the connect time really is zero for the request. Ditto if you're using
> the cache. Your timer reports the average connect time per *connection*,
> and there it's much more suitable to analyse the infrastructure's impact
> on performance. Both are equally useful but do not report the same metric.
>

It was not my intention to change that. In my mind they both report the
very same thing with one major difference: current implementation provides
misleading results before it processes at least TIME_STATS_SAMPLES samples
(it always assumes there were at least TIME_STATS_SAMPLES samples
processed) whereas new implementation dynamically scales n value depending
on amount of samples it processed so far. In fact up to TIME_STATS_SAMPLES
samples it should produce an exact moving average. New implementation takes
into account reuse logic too, but after reload there are no connections to
be re-used, so that's why latency sharply goes up (I actually looked at
timing reported in log entries to confirm that). ctime, ttime etc. are also
produced for tcp mode (which makes sense in my mind), so documentation
might not be accurate in this matter (just tested it).


> I'd be in favor of creating a new one for yours. Anyway we're still missing
> a number of other ones like the average TLS handshake cost on each side,
> which should also be useful both per connection and per request. I'm saying
> this in case that helps figuring a pattern to find a name for this one :-)
>

I don't mind creating additional metrics if they make sense of course. One
of them (which could help to avoid exceptions) would be req_tot for a
server (it's available for backend). To dynamically scale n, req_tot is
much more suitable than lbconn for http mode (lbconn is incremented much
earlier than req_tot creating a time window when metrics can be incorrectly
calculated leading to wrong averages). Of course if you still think it's a
good idea to separate them (personally I'm not convinced about this unless
you don't want to change behaviour of existing metric etc.) I can make new
metrics eg. ctime_avg. Keep in mind that I already decided (as you
suggested) to create new functions.
What is your take on not exposing the metric if there is no data? There
were no requests and we don't know what connect time is. Shall we report 0
or nothing (empty) value ?


> > - I haven't changed a similar swrate_add_scaled function as it's not used
> > yet and the description feels a bit misleading to me.
>
> So I'd rather have a different set of functions for this (and we'd update
> the stats code to use these). The reason is that these ones were made to
> be usable at extremely low cost in the lower layers. For example we do use
> them in the activity measurements that are at the heart of the polling loop
> and I also have a pending patch to use them in pool_free() under a thread
> lock where we definitely don't want to spend our time adding an integer
> divide when the current code only requires one addition and a right shift.
> Because typically, knowing that for these core operations that can easily
> be called billions of times a day, we'd have extra instructions only to
> improve the accuracy of the measurement during the very first millisecond
> would be a waste.
>

I see your point. I tried to optimize them as much as I could, but of
course there's a limit somewhere..

We could call the new ones swrate_add_range() and swrate_avg_range(), or
> _partial for example, or anything more suitable you'd come up with.
>

Fair enough.

I like your approach, because I've very recently been thinking about this
> as well but came up with a bad idea which I didn't want to implement. I
> thought that we would have the first sample fill the whole average. But
> that required to track the info that it was the first sample, which I
> thought was not easy. But actually for each of them you managed to figure
> from a companion metric if there are samples or not, and your approach
> would be more accurate than mine that gave too much importance to the first
> sample.
>
> I've seen a few things however in your patch:
>
> > -static inline unsigned int swrate_add(unsigned int *sum, unsigned int
> n, unsigned int v)
> > +static inline unsigned int swrate_add(unsigned int *sum, unsigned int
> n, unsigned int v, int all)
> >  {
> >       unsigned int new_sum, old_sum;
> >
> >       old_sum = *sum;
> >       do {
> > -             new_sum = old_sum - (old_sum + n - 1) / n + v;
> > +             new_sum = old_sum + v - (all ? old_sum / n : 0);
>
> Be careful above, you're losing precision by not adding n-1 to the value,
> you're systematically rounding it down to the nearest multiple of n while
> it was rounded up, so your older values will fade away quicker. If you want
> to maintain accuracy, you need to keep the fractional part of the
> operation.
>

Now I know why you used n - 1 in the equation! :-)) I was wondering about
that. In your case essentially you round it up which I think it's a sane
approach.


> It's even possible to further improve the accuracy by only adding n/2:
>
>                 new_sum = old_sum + v - (all ? (old_sum + n / 2) / n : 0);
>

That depends if we want to just round it or round it up.


> In addition, since the sole point of "all" above is to ignore the use of
> "n" applied to the past value, it'd be better to fuse it into "n" by
> declaring that if n==0 we simply don't fade out the old value, which
> further simplifies the output code:
>
>                 new_sum = old_sum + v - (n ? (old_sum + n / 2) / n : 0);
>
> Last, in terms of readability I find it preferable to have the old sum
> computation on one side and the new sample (v) on the other side. i.e.
>
>                 new_sum = v + old_sum - (n ? (old_sum + n / 2) / n : 0);
>
> (or v at the end, I don't care, all produce the exact same code anyway).
>

Makes sense.


> >  static inline unsigned int swrate_avg(unsigned int sum, unsigned int n)
> >  {
> > -     return (sum + n - 1) / n;
> > +     return sum / n;
> >  }
>
> Same point for this one, values are systematically rounded down while
> they should not. That makes me realize that your variant should not be
> affected by keeping the original one here and that it means you'd just
> have one extra function to add samples over a partial range of samples,
> just like we have swrate_add_scaled() that's not used anymore.
>
> I'd also ask you to split your patch in 2 :
>
>   - one which improves the freq_ctr with the new function(s)
>   - one which adds the new metric, its update and reporting to the
>     various call places.
>
> In addition, I think there will be less changes with a separate metric.
>

Will do. Thank you for the review.
Regards,

Marcin Deranek
Senior System Administrator

Booking.com B.V.
Singel 540 Amsterdam 1017AZ Netherlands
Direct +31207243362
[image: Booking.com] <https://www.booking.com/>
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)

Re: [External] Re: [RFC PATCH] Improved moving averages

Reply via email to