Hi again Michal,

So in the end I already had to revert your latest patch, I should have
been more careful before merging it.

> We need some CI (even if they will only build haproxy) and IMHO people with
> @haproxy.com mails should test their code before posting and merging :(

Thus please let me reuse your own words above :

  You need some CI, and IMHO people with @allegrogroup.com mails should
  test their code before posting.

but I won't generalize and as I previously said it's development so we
are allowed to break a bit for the sake of trying to make something better,
so case closed.

The problem is immediately spotted in configs like this one :

  global
        stats timeout 1h
        stats socket /var/run/haproxy.sock level admin mode 600

  defaults
        maxconn 100
        log 127.0.0.1 local0
        option log-health-checks
        mode http
        timeout client 3s
        timeout server 1s
        timeout connect 10s
        default-server maxconn 50 weight 100

  listen main
        bind :8000
        stats uri /stat
        use_backend customer1 if { req.hdr(host) -i customer1 }
        use_backend customer2 if { req.hdr(host) -i customer2 }
        use_backend customer3 if { req.hdr(host) -i customer3 }

  backend servers
        server s1 127.0.0.1:8001 check inter 1s

  backend customer1
        balance uri
        hash-type consistent
        server s1 127.0.0.1:8001 track servers/s1

  backend customer2
        balance uri
        server s1 127.0.0.1:8001 track servers/s1

  backend customer3
        balance uri
        hash-type consistent
        server s1 127.0.0.1:8001 track servers/s1

There are 3 backends, one per customer. Two customers use a consistent
hash on the URI while the other one uses a standard hash (map-based).
But all point to the same server, it's just a matter of choice. In fact
an even more common setup with reverse caches looks like this when
users want to use one farm under moderate load and another one under
high load :

   backend leastconn
       balance leastconn
       server s1 127.0.0.1:8001 track cache/s1
       server s2 127.0.0.2:8001 track cache/s2
       server s3 127.0.0.3:8001 track cache/s3

   backend cache
       balance uri
       server s1 127.0.0.1:8001 check
       server s2 127.0.0.2:8001 check
       server s3 127.0.0.3:8001 check

The problem faced here is when a non-integral weight change happens, such
as "50%". Since it can only be applied to backends using a dynamic LB algo,
it will be rejected on others. In the first example above, doing so will
lead to server "s1" in farms "servers" and "customer3" to turn to 50%,
farm "customer2" to generate an error and to abort the processing, and
farm "customer1" not to be changed despite being dynamic thus compatible.

At the very least it's very important that the change remains consistent
across multiple similar farms. Having both customer1 and customer3 identical
and tracking "servers" with different states after a change is not acceptable.

Now what's the best approach here ? I don't really know. We could refuse
to apply the change at all and roll everything back, but that means that
you are supposed to save the respective servers weights in order to restore
them. And it also means that having one new farm tracking a central one and
making use of a non-compatible balance algorithm could prevent all the other
ones from adjusting their weights. Or we can decide that after all, a change
performed to a map-based farm normally only triggers an error and is ignored,
so probably we could simply report the first error, continue, then report the
number of errors at the end of the operation and the rate of success.

Still another problem will remain :

  defaults
        balance uri

  backend servers
        server s1 127.0.0.1:8001 check inter 1s

  backend customer1
        hash-type consistent
        server s1 127.0.0.1:8001 track servers/s1

  backend customer2
        hash-type consistent
        server s1 127.0.0.1:8001 track servers/s1

  backend customer3
        hash-type consistent
        server s1 127.0.0.1:8001 track servers/s1

This one will always fail to use non-integral weigths because of
the farm receiving the order which itself doesn't use a dynamic
algorithm, and will not be propagated. We could decide that we can
ignore the problem, but then it will render this one inconsistent :

  defaults
        balance uri

  backend servers
        server s1 127.0.0.1:8001 check inter 1s

  backend customer1
        server s1 127.0.0.1:8001 track servers/s1

  backend customer2
        hash-type consistent
        server s1 127.0.0.1:8001 track customer2/s1

where one could expect that a dynamic backend tracking a static one
would always stay in sync with it, but that would not be true anymore.

So for now I think this requires more thinking, probable a specific server
option like "track-weight" or something like this to let the user decide
whether or not the weights should be inherited from tracked servers. This
would limit the risk of very bad surprizes.

Regards,
Willy

Reply via email to