On Fri, 2020-02-14 at 05:06 +0100, Willy Tarreau wrote:
> Hello,
> 
> On Thu, Feb 13, 2020 at 10:04:48PM +0000, Carl Henrik Holth Lunde
> wrote:
> > Hello, attached is a patch to improve startup/reload time for
> > OpenShift and large configuration files in general.
[...]
> The patch looks good at first glance, however I must have a deeper
> check because I have no idea what these unique_ids are used for :-)
> Maybe we could even completely drop them!
> 
I do not know if they have any real-world use, but the use case is to
change ACLs after the configuration is loaded. I assume `-u` is
supported to allow an administrator to manually number an ACL (i.e. -u
100), and then later change that ACL. OpenShift does not do this.

The patch was written under the assumption that you wanted exactly the
same behavior. Since you say we maybe can drop them, I have created a
new patch which should work fine but does not guarantee that automatic
ids form a gapless sequence from 0. This simplifies the algorithm
csignificantly. The downside is that an overflow can happen if a very
high -u number is used in the config file, so I added a check for that.
The maximum is still very high.


> I don't know how the current code deals with duplicated IDs nor what
> impact these might have.
> 

Duplicate IDs are handled when the configuration file is read. That
code is not efficient either but I did not look at improving that
because I do not know if anyone uses this feature *and* use very large
configuration files. For small hand written files the code is fine.

> Thanks,
> Willy

Thanks for the review, and also thanks to Andrew McDermott from Red Hat
for feedback and testing.
From aa47080bb0dfa70f483063105ba19bdbaa4a289c Mon Sep 17 00:00:00 2001
From: Carl Henrik Lunde <[email protected]>
Date: Wed, 5 Feb 2020 21:28:26 +0100
Subject: [PATCH] OPTIM: startup: fast unique_id allocation for acl.

pattern_finalize_config() uses an inefficient algorithm which is a
problem with very large configuration files. This affects startup, and
therefore reload time. When haproxy is deployed as a router in a
Kubernetes cluster the generated configuration file may be large and
reloads are frequently occuring, which makes this a significant issue.

The old algorithm is O(n^2)
* allocate missing uids - O(n^2)
* sort linked list - O(n^2)

The new algorithm does not care about making sure all IDs are in
sequence without gaps and is O(n)
* find the highest user allocated uid - O(n)
* allocate automatic uids - O(n)

To prevent overflow, manually assigned UIDs with -u are limited to
INT_MAX>>1, typically 1073741823.

Performance examples, startup time in seconds:

    pat_refs old     new
    1000      0.01  <0.01
    10000     2.9    0.03
    20000    12.1    0.06
    30000    27.3    0.09
    40000    44.5    0.12
    50000    78.6    0.16

Please backport to 1.8 and 2.1.
---
 src/acl.c     |  5 +++++
 src/pattern.c | 46 ++++++++++------------------------------------
 2 files changed, 15 insertions(+), 36 deletions(-)

diff --git a/src/acl.c b/src/acl.c
index 1e32271be..047715fdf 100644
--- a/src/acl.c
+++ b/src/acl.c
@@ -431,6 +431,11 @@ struct acl_expr *parse_acl_expr(const char **args, char **err, struct arg_list *
 				goto out_free_expr;
 			}
 
+			if (unique_id < 0 || unique_id >= (INT_MAX >> 1)) {
+				memprintf(err, "the argument of -u must a positive integer less than %d", INT_MAX>>1);
+				goto out_free_expr;
+			}
+
 			/* Check if this id is really unique. */
 			if (pat_ref_lookupid(unique_id)) {
 				memprintf(err, "the id is already used");
diff --git a/src/pattern.c b/src/pattern.c
index 90067cd23..4dfc0637a 100644
--- a/src/pattern.c
+++ b/src/pattern.c
@@ -2643,53 +2643,27 @@ int pattern_delete(struct pattern_expr *expr, struct pat_ref_elt *ref)
 	return 1;
 }
 
-/* This function finalize the configuration parsing. Its set all the
+/* This function finalize the configuration parsing. It sets all the
  * automatic ids
  */
 void pattern_finalize_config(void)
 {
-	int i = 0;
-	struct pat_ref *ref, *ref2, *ref3;
-	struct list pr = LIST_HEAD_INIT(pr);
+	int next_unique_id = 0;
+	struct pat_ref *ref;
 
 	pat_lru_seed = random();
 
+	/* Find next unique_id (custom ids may be set with -u) */
 	list_for_each_entry(ref, &pattern_reference, list) {
-		if (ref->unique_id == -1) {
-			/* Look for the first free id. */
-			while (1) {
-				list_for_each_entry(ref2, &pattern_reference, list) {
-					if (ref2->unique_id == i) {
-						i++;
-						break;
-					}
-				}
-				if (&ref2->list == &pattern_reference)
-					break;
-			}
-
-			/* Uses the unique id and increment it for the next entry. */
-			ref->unique_id = i;
-			i++;
-		}
+		if (ref->unique_id >= next_unique_id)
+			next_unique_id = ref->unique_id + 1;
 	}
 
-	/* This sort the reference list by id. */
-	list_for_each_entry_safe(ref, ref2, &pattern_reference, list) {
-		LIST_DEL(&ref->list);
-		list_for_each_entry(ref3, &pr, list) {
-			if (ref->unique_id < ref3->unique_id) {
-				LIST_ADDQ(&ref3->list, &ref->list);
-				break;
-			}
-		}
-		if (&ref3->list == &pr)
-			LIST_ADDQ(&pr, &ref->list);
+	/* Set automatic ids when required */
+	list_for_each_entry(ref, &pattern_reference, list) {
+		if (ref->unique_id == -1)
+			ref->unique_id = next_unique_id++;
 	}
-
-	/* swap root */
-	LIST_ADD(&pr, &pattern_reference);
-	LIST_DEL(&pr);
 }
 
 static int pattern_per_thread_lru_alloc()
-- 
2.18.1

Reply via email to