On Sunday, September 11, 2016 7:57:41 PM EDT Willy Tarreau wrote: > > > Also I've been thinking about this issue of the infinite loop that you > > > solved already. As long as c > 1 I don't think it can happen at all, > > > because for any server having a load strictly greater than the average > > > load, it means there exists at least one server with a load smaller than > > > or equal to the average. Otherwise it means there's no more server in > > > the ring because all servers are down, and then the initial lookup will > > > simply return NULL. Maybe there's an issue with the current lookup > > > method, we'll have to study this. > > > > Agreed again, it should be impossible as long as c > 1, but I ran into it. > > I assumed it was some problem or misunderstanding in my code. > > Don't worry I trust you, I was trying to figure what exact case could > cause this and couldn't find a single possible case :-/
I've encountered this again in my re-written branch. I think it has to do with the case where all servers are draining for shutdown. What I see is that whenever I do a restart (haproxy -sf oldpid) under load, the new process starts up, but the old process never exits, and perf shows it using 100% CPU in chash_server_is_eligible, so it's got to be looping and deciding nothing is eligible. Can you think of anything special that needs to be done to handle graceful shutdown? Thanks, Andrew

