On Wed, Jul 08, 2020 at 07:53:51PM +0200, Luke Seelenbinder wrote:
> > We would then pick from the first
> > list and if it's empty, then the next one.
> 
> This slightly concerns me. Hopefully I'm just not quite understanding the 
> behavior.
> 
> Would that imply request A would pick from the primary server group for all
> backend requests (including retires) unless the primary is 100% down / empty?
> An ideal path for us (as odd as it may sound), is to allow the ability for
> request A to go to the primary group first, then optionally redispatch to
> secondary group. This isn't currently possible, and is the source of most of
> our remaining 5xx errors.

Ah I didn't catch this requirement. Well, I guess that would only be a
"configuration detail" :-)  After all we already know if a request is
the first one or a retry (the retry counter is decremented for each
attempt, so we can have that info). We could then configure somewhere
that the first group only takes initial attempt (implying that retries
only go to the second one). Or maybe we could do it even smarter and
say that the first group *prefers* initial requests. I.e. a retry ought
to be processed by the second one if available, and fall back to the
primary one otherwise.

> > We'd just document that the keyword "backup" means "server of the
> > secondary group", and probably figure new actions or decisions to
> > force to use one group over the other one.
> 
> I think if these actions are capable of changing the group picked by retries,
> that addresses my concerns.

Not necessarily but the above yes ;-)

> > I'm dumping all that in case it can help you get a better idea of the
> > various mid-term possibilities and what the steps could be (and also what
> > not to do if we don't want to shoot ourselves in the foot).
> 
> That helps my understanding quite a bit, too!
> 
> Regarding queues, LB algorithms, and such, this is of lesser concern for us.
> We want to reasonably fairly pick backends, but beyond that, we don't much
> care (perhaps therein lies the rub). I was a bit surprised to read that
> requests are queued for particular servers vs for a particular group at the
> moment,

Be careful, this only applies if the LB algorithm *mandates* a certain
server. that's why there's a distinction between determinist and non-
determinist algorithms. The determinist ones will not put their requests
into the backend's queue because you certainly don't want the wrong server
to pick them. Think a hash on the source address for example. It's the same
with cookie-based persistence. This is the case where the server's queue is
used. But non-determinist algorithms put their requests into the backend's
queue which is shared between all servers so that the first available one
will pick the requests. This is what guarantees shortest (and fair) queue
time among all requests, even if some servers are faster than others or if
some requests stay for too long.

> which has some interesting implications for L7 retries based on 5xx
> errors which in turn result in the server being marked down. It could explain
> why we're seeing occasional edge cases of errors that don't make complete
> sense. (Request D comes in, is scheduled for a server, the server goes down
> along with the rest of the group due to Requests A, B, and C failing, Request
> D then fails by default, since the group is empty.)

If a requests is in a server's queue, it will be handled because it means
we have nowhere else to send it. If, however it's in the backend's queue,
it means any server will pick it. However it is totally possible that the
time needed to switch the last server from up to down and refill the farm
with the backup servers leaves enough time for another thread to see an
empty farm and return a 503! Having two server groups could possibly make
this much smoother since we wouldn't have to reconstruct a new group when
switching as there would be no switch at all.

> A first step towards this would be to allow requests to be redispatched to
> the backup group.

This is always the case when option redispatch is set. "redispatch" in haproxy's
terminology means "break persistence and enforce load balancing again". But by
default it applies as a last resort, so it's done on the last retry. You can
change this (I don't remember if it's the redispatch or retries option which
takes an optional number).

Usually when you have a shopping cart you want to go to the same server you
were connected to, and work around rare network glitches by supporting a few
retries. However if the server is really dead you prefer to be forwarded to
another server who will say something polite to you. That's the idea. But in
your case if nobody sticks to a given server you shouldn't care at all about
that and should prefer to pick any server for any retry.

> That would eliminate many of our issues. We're fine with a
> few slower requests if we know they'll likely succeed the second time around
> (because the slow region is not handling both). It'd likely help our 99p and
> 999p times a good bit.

Sure!

> I was hoping 0 weighted servers would allow for this, but I was mistaken,
> since 0 weighted servers are even less used than backup servers. :-)

In fact it depends. For example you can use them with cookies and with
the "use-server" directive. But weight 0 was implemented to allow soft-
stopping a server without kicking off its users: it's not selected by any
load balancing algorithm anymore but the server is up and will still match
persistent requests.

> I hope this helps clarify our needs.

Yes, it sounds clearer :-)

Willy

Reply via email to