Hi Lukas, On Wed, Dec 26, 2018 at 11:07:27PM +0100, Lukas Tribus wrote: > > redispatch never worked for hash based alghoritms, as the code > > (BE_LB_LKUP_CHTREE -> chash_get_next_server()) would only have been > > called for BE_LB_KIND_RR, which doesn't make sense. Fix this by also > > going down this code path when the BE_LB_KIND is BE_LB_KIND_HI. > > > > Reported by Oskar Stenman on discourse: > > https://discourse.haproxy.org/t/balance-uri-consistent-hashing-redispatch-3-not-redispatching/3344 > > > > Can be backported to 1.6. > > Actually my intention was to set this as RFC, with the following > comments, but git send-mail was faster: > > I'm not 100% sure whether "option redispatch" was only intended to > break cookie persistence, but not other lb algorithms. Docs are > ambiguous about this.
I think you meant hashing instead of cookie persistence. But indeed, hashing is one way to enforce strict affinity, and as long as a server is seen up, a request targetting this server must never ever be sent to another one. It could cause quite some trouble in a number of setups (some users don't even implement health checks to guarantee stickiness). You can think for example about a bunch of database servers where objects names are hashed to decide what server to visit, it could be dramatic if they were mixed. > However, since option redispatch is configurable per backend, it > doesn't make a lot of sense to exclude hash based algorithms from > redispatch functionality. Yes it does because there is not much way to safely fall back to a working alternative. If you have hashing, you want all requests for the same element to go to the same place. There is hardly a safe way to fall back to a given server, and doing this when the server is still seen as up could have much more damaging effects than reporting a temporary failure. However in the thread above, I'm seeing that the repoerter is using consistent hashing. Consistent hashing is different, as we expect it to be much less deterministic. Thus we could imagine implementing the redispatch there avoiding picking the same server. In this case I think we could slightly modify your patch to permit this case to work. By the way, looking deeper, I think your patch totally breaks hashing by replacing it with round robin, as I don't see how we can reach the hashing code anymore, though I might have overlooked something. I haven't looked as deeply as you did, but I think that there might be something wrong with the way the consistent hashing works when trying to avoid a server. Indeed, we enter the switch/case here : 638 case BE_LB_LKUP_CHTREE: 639 case BE_LB_LKUP_MAP: 640 if ((s->be->lbprm.algo & BE_LB_KIND) == BE_LB_KIND_RR) { the algo is not round-robin so we continue : 648 } 649 else if ((s->be->lbprm.algo & BE_LB_KIND) != BE_LB_KIND_HI) { it's indeed a hash so we skip this block and end up in the place where we're hashing the values : 655 switch (s->be->lbprm.algo & BE_LB_PARM) { 656 case BE_LB_HASH_SRC: (...) 697 case BE_LB_HASH_HDR: 698 /* Header Parameter hashing */ 699 if (!s->txn || s->txn->req.msg_state < HTTP_MSG_BODY) 700 break; 701 srv = get_server_hh(s); 702 break; So here for example, in order to hash a header we call the get_server_hh() function passing it the stream and never passing the previous server. It is this function (and its cousins for other params) which finally calls either chash_get_server_hash() or map_get_server_hash(). And there we don't have any information to decide what server to avoid. But looking at chash_get_server_hash(), it would be trivial to pass the pointer to the server to avoid : 318 struct server *chash_get_server_hash(struct proxy *p, unsigned int hash) 319 { (...) 370 while (p->lbprm.chash.balance_factor && !chash_server_is_eligible(nsrv)) { 371 next = eb32_next(next); 372 if (!next) { 373 next = eb32_first(root); 374 if (++loop > 1) // protection against accidental loop 375 break; 376 } 377 nsrv = eb32_entry(next, struct tree_occ, node)->server; 378 } 379 So in short we could pass "avoid" as an argument and change the loop condition like this : 370 while (p->lbprm.chash.balance_factor && nsrv != avoid && !chash_server_is_eligible(nsrv)) { I think the patch will not be as small as expected however it should be quite trivial because it would only be an API change consisting in adding an extra argument to all these intermediary functions, thus there would be almost no risk of breakage and we could probably backport it to 1.9 and 1.8. > Also, the fact the code is already there, it's just never called > (s->be->lbprm.algo is never BOTH BE_LB_KIND_RR and BE_LB_LKUP_CHTREE) > makes me think that this was just an oversight. Apparently we do have this combination for the "random" algorithm. > Backporting this could change behavior as redispatch suddenly begins > to work in such a situation; however this is expected and a correct > configuration fixes it immediately. I don't see it as expected most of the time due to option redispatch often being set in a defaults section, and hasing being used to enforce strict affinity. But doing it with consistent hashing sounds fine because when using consistent hashing you don't seek strict affinity but rather the "best server for the job at this instant", and in this case redispatching to the next neighbour would definitely make sense. Regards, Willy