Hi Luke!

On Wed, Jul 08, 2020 at 11:57:15AM +0200, Luke Seelenbinder wrote:
> I've been following along the torturous road, and I'm happy to see all the
> issues resolved and the excellent results.

You can imagine how I am as well :-)

> Personally, I'm excited about the
> performance gains. I'll deploy this soon on our network.


> To dig up an old discussion--I took a look at better support for SRV records
> (using the priority field as backup/non-backup, etc.) a few weeks ago, but
> determined it didn't make sense in our use case. The issue is 0 weighted
> servers are considerably less useful to us since they aren't ever used, even
> in the condition where every other server is down.

I seem to remember a discussion about making this configurable but I
don't seem to see any commit matching anything like that, so maybe the
discussion ended up in "change the behavior again the previous one was
wrong", I don't remember well.

> That raises the next question: is the idea of server groups (with the ability
> for a request to try group 1, then group 2, etc. on retries) in the
> development plans at some point? Would that be something I could tinker as a
> longer term project?

That could indeed be an interesting approach because we already almost do
that between active and backup servers, except that there is always one
single group at a time. In fact there are 4 possible states for a servers

  - populated only with all active servers which are UP or unchecked,
    provided that there is at least one such server ;

  - populated only with all backup servers which are UP or unchecked,
    provided there is at least one such server, that no active server
    exists in UP or unchecked state, and that option useallbackups is

  - populated with the first UP or unchecked backup server, provided that
    there is at last one such server, that no active server exists in UP
    or unchecked state, and that option useallbackups is not set;

  - no server: all are down ;

With your approach it would be almost identical except that we would
always have two load-balancing groups, a primary one and a secondary
one, the first one made only of the active servers and the second one
made only of the backup servers. We would then pick from the first
list and if it's empty, then the next one.

I shouldn't even consume too much memory since the structures used to
attach the servers to the group are carried by the servers themselves.
Only static hash-based algorithm would cause a memory increase on the
backend but they're rarely used with many servers due to the high risk
of rebalancing so I gues that could be a pretty reasonable change.

We'd just document that the keyword "backup" means "server of the
secondary group", and probably figure new actions or decisions to
force to use one group over the other one.

Please note that I'd rather avoid adding too many groups into a farm
because we don't want to start to scan many of them. If keeping 2 as
we have today is already sufficient for your use case, I'd rather
stick to this.

We still need to put a bit more thoughts on this because I vaguely
remember an old discussion where someone wanted to use a different
LB algorithm for the backup servers. Here in terms of implementation
it would not be a big deal, we could have one LB algo per group. But
in terms of configuration (for the user) and configuration storage
(in the code), it would be a real pain. But possibly that it would
still be worth the price if it starts to allow to assemble a backend
by "merging" several groups (that's a crazy old idea that has been
floating around for 10+ years and which could possibly make sense in
the future to address certain use cases).

If you're interested in going on these ideas, please, oh please, never
forget about the queues (those that are used when you set a maxconn
parameter), because their behavior is thightly coupled with the LB
algorithms, and the difficulty is to make sure a server which frees
a connection slot can immediately pick the oldest pending request
either in its own queue (server already assigned) or the backend's
(don't care about what server handles the request). This may become
more difficult when dealing with several groups, hence possibly queues.

My secret agenda would ideally be to one day support shared server groups
with their own queues between multiple backends so that we don't even
need to divide the servers' maxconn anymore. But it's still lacking some

I'm dumping all that in case it can help you get a better idea of the
various mid-term possibilities and what the steps could be (and also what
not to do if we don't want to shoot ourselves in the foot).


Reply via email to