On Thu, Feb 12, 2015 at 12:23:37PM -0800, [email protected] wrote: > Management is saying we're going to drop haproxy for nginx-plus because of > this problem.... so last chance if anyone has any ideas on this.
Well, for free they can decide to disable the health checks as well, and they will get a similar result. But it's not by killing the messenger that they will get a better service. What we've seen in the past were some Java-based webservices randomly hanging for several seconds (to tens of seconds) just because the garbage collector was running. In this case, only someone skilled in JVM tuning may help. If you can't share your capture (and I respect this for having worked in environment where it was not accepted), at least verify if the server is not sending a TCP RST just after the response. That would have the effect of telling the local TCP stack to destroy the pending contents from its buffers, hence haproxy will randomly get them. It can happen when developers erroneously disable lingering on the check socket because they're scared by the TIME_WAITs. Hoping this helps, and good luck with your management. Other companies are hiring you know :-) Willy

