Hi Lukas,

On Wed, Dec 26, 2018 at 11:07:27PM +0100, Lukas Tribus wrote:
> > redispatch never worked for hash based alghoritms, as the code
> > (BE_LB_LKUP_CHTREE -> chash_get_next_server()) would only have been
> > called for BE_LB_KIND_RR, which doesn't make sense. Fix this by also
> > going down this code path when the BE_LB_KIND is BE_LB_KIND_HI.
> >
> > Reported by Oskar Stenman on discourse:
> > https://discourse.haproxy.org/t/balance-uri-consistent-hashing-redispatch-3-not-redispatching/3344
> >
> > Can be backported to 1.6.
> 
> Actually my intention was to set this as RFC, with the following
> comments, but git send-mail was faster:
> 
> I'm not 100% sure whether "option redispatch" was only intended to
> break cookie persistence, but not other lb algorithms. Docs are
> ambiguous about this.

I think you meant hashing instead of cookie persistence. But indeed,
hashing is one way to enforce strict affinity, and as long as a server
is seen up, a request targetting this server must never ever be sent
to another one. It could cause quite some trouble in a number of setups
(some users don't even implement health checks to guarantee stickiness).
You can think for example about a bunch of database servers where objects
names are hashed to decide what server to visit, it could be dramatic if
they were mixed.

> However, since option redispatch is configurable per backend, it
> doesn't make a lot of sense to exclude hash based algorithms from
> redispatch functionality.

Yes it does because there is not much way to safely fall back to a
working alternative. If you have hashing, you want all requests for
the same element to go to the same place. There is hardly a safe way
to fall back to a given server, and doing this when the server is
still seen as up could have much more damaging effects than reporting
a temporary failure.

However in the thread above, I'm seeing that the repoerter is using
consistent hashing. Consistent hashing is different, as we expect it
to be much less deterministic. Thus we could imagine implementing the
redispatch there avoiding picking the same server. In this case I
think we could slightly modify your patch to permit this case to work.

By the way, looking deeper, I think your patch totally breaks hashing
by replacing it with round robin, as I don't see how we can reach the
hashing code anymore, though I might have overlooked something.

I haven't looked as deeply as you did, but I think that there might
be something wrong with the way the consistent hashing works when trying
to avoid a server. Indeed, we enter the switch/case here :

   638                  case BE_LB_LKUP_CHTREE:
   639                  case BE_LB_LKUP_MAP:
   640                          if ((s->be->lbprm.algo & BE_LB_KIND) == 
BE_LB_KIND_RR) {

the algo is not round-robin so we continue :

   648                          }
   649                          else if ((s->be->lbprm.algo & BE_LB_KIND) != 
BE_LB_KIND_HI) {

it's indeed a hash so we skip this block and end up in the place where
we're hashing the values :

   655                          switch (s->be->lbprm.algo & BE_LB_PARM) {
   656                          case BE_LB_HASH_SRC:
(...)
   697                          case BE_LB_HASH_HDR:
   698                                  /* Header Parameter hashing */
   699                                  if (!s->txn || s->txn->req.msg_state < 
HTTP_MSG_BODY)
   700                                          break;
   701                                  srv = get_server_hh(s);
   702                                  break;

So here for example, in order to hash a header we call the get_server_hh()
function passing it the stream and never passing the previous server. It
is this function (and its cousins for other params) which finally calls
either chash_get_server_hash() or map_get_server_hash(). And there we
don't have any information to decide what server to avoid.

But looking at chash_get_server_hash(), it would be trivial to pass the
pointer to the server to avoid :

   318  struct server *chash_get_server_hash(struct proxy *p, unsigned int hash)
   319  {
(...)
   370          while (p->lbprm.chash.balance_factor && 
!chash_server_is_eligible(nsrv)) {
   371                  next = eb32_next(next);
   372                  if (!next) {
   373                          next = eb32_first(root);
   374                          if (++loop > 1) // protection against 
accidental loop
   375                                  break;
   376                  }
   377                  nsrv = eb32_entry(next, struct tree_occ, node)->server;
   378          }
   379  

So in short we could pass "avoid" as an argument and change the loop
condition like this :

   370          while (p->lbprm.chash.balance_factor && nsrv != avoid && 
!chash_server_is_eligible(nsrv)) {

I think the patch will not be as small as expected however it should be
quite trivial because it would only be an API change consisting in adding
an extra argument to all these intermediary functions, thus there would be
almost no risk of breakage and we could probably backport it to 1.9 and
1.8.

> Also, the fact the code is already there, it's just never called
> (s->be->lbprm.algo is never BOTH BE_LB_KIND_RR and BE_LB_LKUP_CHTREE)
> makes me think that this was just an oversight.

Apparently we do have this combination for the "random" algorithm.

> Backporting this could change behavior as redispatch suddenly begins
> to work in such a situation; however this is expected and a correct
> configuration fixes it immediately.

I don't see it as expected most of the time due to option redispatch
often being set in a defaults section, and hasing being used to enforce
strict affinity. But doing it with consistent hashing sounds fine
because when using consistent hashing you don't seek strict affinity
but rather the "best server for the job at this instant", and in this
case redispatching to the next neighbour would definitely make sense.

Regards,
Willy

Reply via email to