On Thursday, September 15, 2016 4:06:15 AM EDT Willy Tarreau wrote: > Hi Andrew, > > On Wed, Sep 14, 2016 at 02:44:26PM -0400, Andrew Rodland wrote: > > On Sunday, September 11, 2016 7:57:41 PM EDT Willy Tarreau wrote: > > > > > Also I've been thinking about this issue of the infinite loop that > > > > > you > > > > > solved already. As long as c > 1 I don't think it can happen at all, > > > > > because for any server having a load strictly greater than the > > > > > average > > > > > load, it means there exists at least one server with a load smaller > > > > > than > > > > > or equal to the average. Otherwise it means there's no more server > > > > > in > > > > > the ring because all servers are down, and then the initial lookup > > > > > will > > > > > simply return NULL. Maybe there's an issue with the current lookup > > > > > method, we'll have to study this. > > > > > > > > Agreed again, it should be impossible as long as c > 1, but I ran into > > > > it. > > > > I assumed it was some problem or misunderstanding in my code. > > > > > > Don't worry I trust you, I was trying to figure what exact case could > > > cause this and couldn't find a single possible case :-/ > > > > I've encountered this again in my re-written branch. I think it has to do > > with the case where all servers are draining for shutdown. What I see is > > that whenever I do a restart (haproxy -sf oldpid) under load, the new > > process starts up, but the old process never exits, and perf shows it > > using 100% CPU in chash_server_is_eligible, so it's got to be looping and > > deciding nothing is eligible. Can you think of anything special that > > needs to be done to handle graceful shutdown? > > No, that's very strange. We may have a bug somewhere else which never > stroke till now. When you talk about a shutdown, you in fact mean the > shutdown of the haproxy process being replaced by another one, that's > right ? If so, health checks are disabled during that period so servers > should not be added to nor removed from the ring. > > However if for any reason there's a graceful shutdown on the servers, > their weight can be set to zero while they're still active. In this > case they don't appear in the tree and that may be where the issue > starts. It would be nice to get a 100% reproducible case to try to > debug it and dump all weights and capacities, I think it would help. > > Willy
I haven't found the cause of this, or been able to pin it down much further than that it happens fairly reliably when doing a "haproxy -sf" restart under load. Other than that, I think I have things working properly and would appreciate a bit of review. My changes are on the "bounded-chash" branch of github.com/arodland/haproxy — or would you prefer a patch series sent to the list? Thanks, Andrew

