On Wed, Jul 17, 2013 at 08:16:18PM +0200, Lukas Tribus wrote: > Hi Willy, > > > > This explains why this only happens for short durations (at most the > > duration > > of a client timeout). > > Good to hear you pinpointed this. > > > > > What is important is to know that CPU usage aside, there is no loss of > > information nor service. Connections are correctly handled > > Mark early wrote that in single process mode, when HAProxy hogged the CPU, > the service was unresponsive, which is why he enabled multi process mode. Do > you confirm this is not the case?
>From my analysis, this should not be the case. But the issue it still very tricky to completely get right, and if in the end I found a long dependency chain, I would not be that surprized. The traces that Mark sent me clearly show that there is an unhandled event which prevents poll() from waiting, hence the loop, but all parallel activity is still handled well. That does not mean that there isn't another bug in parallel. Also I tend to be careful about issues encountered upon a production issue, because often we realize after the bug is fixed that the first analysis was incomplete or wrong due to the emergency of the situation or the stacking of operations. For example I once faced an issue where haproxy was working in SSL mode at very high load. After killing it and restarting it it would go 100% user and not serve any request for a while. I immediately thought about a bit bug preventing connections from being accepted. In fact it was simply because we lost all SSL cache information and all incoming connections had to be renegociated. The machine was on its knees calculating RSA keys for every incoming request (the machine was undersized for SSL). And indeed, after some time, everything went smoothly again. Best regards, Willy

