On Thu, Feb 13, 2020 at 10:04:48PM +0000, Carl Henrik Holth Lunde wrote:
> Hello, attached is a patch to improve startup/reload time for
> OpenShift and large configuration files in general.
> We have an issue in our OpenShift cluster with slow router
> (HAProxy) reloads when we have many routes. This patch solved
> our issue and reduces reload time from 45 seconds to 0.7
> seconds.

Nice saving, indeed!

> The cluster has 11 000 routes.

You mean, that many per frontend ? If so it's huge and is probably
quite slow at runtime. It would be useful to get an example of a
part of these to see how that could be simplified. Usually using
maps you can turn that to just a few.

> Please review the attached patch. There are some minor
> optimizations possible, but considering the resulting performance
> improvement with the generic algorithm I have not added anything
> for this initial patch.

The patch looks good at first glance, however I must have a deeper
check because I have no idea what these unique_ids are used for :-)
Maybe we could even completely drop them!

Another point is that you should avoid placing exit() in various
init functions, this tends to report incomplete errors that users
have no idea what to do with. Maybe this will require to change
the function's prototype to return a value instead so that the
caller can increment the error count and quit from the standard

I don't know how the current code deals with duplicated IDs nor what
impact these might have.

Also I'm seeing something that can be further refined:

> +     /* Assign unique ids to the rest of the elements */
> +     for (i = unassigned_pos; i < len; i++) {
> +             do {
> +                     arr[i]->unique_id = next_unique_id++;
> +             } while (bsearch(&arr[i], arr, unassigned_pos, sizeof(struct 
> pat_ref *), cmp_pat_ref));

Here since your first part of the array is already sorted, every time you
find an ID, you know that the next match will have a strictly higher index
so you could restart from the previous match+1 and halve the bsearch cost
on average. The worst case here is when half of the IDs are assigned and
half are unassigned, which maximizes both the bsearch() cost and the number
of calls. It's a detail but since you focus on efficiency... ;-)


Reply via email to