Hi Willy,
On Thu, 27 Dec 2018 at 15:54, Willy Tarreau <w...@1wt.eu> wrote: > > I'm not 100% sure whether "option redispatch" was only intended to > > break cookie persistence, but not other lb algorithms. Docs are > > ambiguous about this. > > I think you meant hashing instead of cookie persistence. But indeed, > hashing is one way to enforce strict affinity, and as long as a server > is seen up, a request targetting this server must never ever be sent > to another one. It could cause quite some trouble in a number of setups > (some users don't even implement health checks to guarantee stickiness). > You can think for example about a bunch of database servers where objects > names are hashed to decide what server to visit, it could be dramatic if > they were mixed. > > > However, since option redispatch is configurable per backend, it > > doesn't make a lot of sense to exclude hash based algorithms from > > redispatch functionality. > > Yes it does because there is not much way to safely fall back to a > working alternative. If you have hashing, you want all requests for > the same element to go to the same place. There is hardly a safe way > to fall back to a given server, and doing this when the server is > still seen as up could have much more damaging effects than reporting > a temporary failure. > > However in the thread above, I'm seeing that the repoerter is using > consistent hashing. Consistent hashing is different, as we expect it > to be much less deterministic. Thus we could imagine implementing the > redispatch there avoiding picking the same server. In this case I > think we could slightly modify your patch to permit this case to work. > > By the way, looking deeper, I think your patch totally breaks hashing > by replacing it with round robin, as I don't see how we can reach the > hashing code anymore, though I might have overlooked something. > > I haven't looked as deeply as you did, but I think that there might > be something wrong with the way the consistent hashing works when trying > to avoid a server. Indeed, we enter the switch/case here : > > 638 case BE_LB_LKUP_CHTREE: > 639 case BE_LB_LKUP_MAP: > 640 if ((s->be->lbprm.algo & BE_LB_KIND) == > BE_LB_KIND_RR) { > > the algo is not round-robin so we continue : > > 648 } > 649 else if ((s->be->lbprm.algo & BE_LB_KIND) != > BE_LB_KIND_HI) { > > it's indeed a hash so we skip this block and end up in the place where > we're hashing the values : > > 655 switch (s->be->lbprm.algo & BE_LB_PARM) { > 656 case BE_LB_HASH_SRC: > (...) > 697 case BE_LB_HASH_HDR: > 698 /* Header Parameter hashing */ > 699 if (!s->txn || s->txn->req.msg_state > < HTTP_MSG_BODY) > 700 break; > 701 srv = get_server_hh(s); > 702 break; > > So here for example, in order to hash a header we call the get_server_hh() > function passing it the stream and never passing the previous server. It > is this function (and its cousins for other params) which finally calls > either chash_get_server_hash() or map_get_server_hash(). And there we > don't have any information to decide what server to avoid. > > But looking at chash_get_server_hash(), it would be trivial to pass the > pointer to the server to avoid : > > 318 struct server *chash_get_server_hash(struct proxy *p, unsigned int > hash) > 319 { > (...) > 370 while (p->lbprm.chash.balance_factor && > !chash_server_is_eligible(nsrv)) { > 371 next = eb32_next(next); > 372 if (!next) { > 373 next = eb32_first(root); > 374 if (++loop > 1) // protection against > accidental loop > 375 break; > 376 } > 377 nsrv = eb32_entry(next, struct tree_occ, > node)->server; > 378 } > 379 > > So in short we could pass "avoid" as an argument and change the loop > condition like this : > > 370 while (p->lbprm.chash.balance_factor && nsrv != avoid && > !chash_server_is_eligible(nsrv)) { > > I think the patch will not be as small as expected however it should be > quite trivial because it would only be an API change consisting in adding > an extra argument to all these intermediary functions, thus there would be > almost no risk of breakage and we could probably backport it to 1.9 and > 1.8. > > > Also, the fact the code is already there, it's just never called > > (s->be->lbprm.algo is never BOTH BE_LB_KIND_RR and BE_LB_LKUP_CHTREE) > > makes me think that this was just an oversight. > > Apparently we do have this combination for the "random" algorithm. > > > Backporting this could change behavior as redispatch suddenly begins > > to work in such a situation; however this is expected and a correct > > configuration fixes it immediately. > > I don't see it as expected most of the time due to option redispatch > often being set in a defaults section, and hasing being used to enforce > strict affinity. But doing it with consistent hashing sounds fine > because when using consistent hashing you don't seek strict affinity > but rather the "best server for the job at this instant", and in this > case redispatching to the next neighbour would definitely make sense. In my tests your patch does the job and redispatch does work for consistent source or uri hashing, but only if hash-balance-factor is set. If I comment hash-balance-factor out, the redispatch does not occur: #balance uri balance source hash-type consistent hash-balance-factor 150 server primary-fail 10.0.0.199:80 maxconn 200 weight 256 # unreachable server backup-ok 10.0.0.254:80 maxconn 200 weight 1 Not sure why this happens. Regards, Lukas