On Thu, Sep 12, 2013 at 03:49:33PM +0200, Lukas Tribus wrote: > Hi! > > > > A few days ago one of our machines logged this: > > > > > > > > Sep 10 10:54:29 web8 kernel: haproxy: page allocation failure. order:1, > > mode:0x20 > > The kernel has problems allocating memory to haproxy. Since we don't see > the OOM killer in action, I guess your memory is heavily fragmented.
It is possible the problem is even deeper. haproxy called connect(), which happened to enable irqs and was suddenly interrupted by a softirq pertaining to a pending incoming ACK packet completing a pending incoming connection. The TCP receive path was called from the softirq to create a real connection from a connection request, and at this exact point failed a memory allocation to receive a packet. > I guess this box has a long uptime? > > How much free RAM does "free -m" show? Also please check your sysctl.conf in case you would have changed some of them based on the advices from random sites (we often find wrong settings causing 4096 times too much RAM being allocated to the network stack). And you should check if you were not running some backups or anything I/O intensive at the same time, as it could fill the RAM with a lot of cached data if the tuning is not that good. If you didn't do it and still have some RAM, I'd suggest reporting that to Red Hat who may be interested in investigating this issue as it could be very specific to their kernel. > > Should I be worried? > > Memory allocation failure will lead to application failures. I would take > this seriously. In this case it did not hit haproxy (otherwise it would have been killed). However one incoming connection was destroyed and we don't know why. So yes it could be a very serious issue. > > An upgrade to 1.4.24 is planned Real Soon(TM), but I am unsure if it?s > > a known error that?s fixed in a later version. > > Upgrading to 1.4.24 is important, there are several issues with 1.4.22. > However, it will not fix this problem, as this is not a bug in haproxy. > The problem mostly depends on your kernel. Clearly. > A quick fix is to reboot the box, which will make the problem go away > for now. > > I suggest upgrading OS/kernel to a more recent version. I believe that 2.6.32-358 is a reasonably recent one, though I may be wrong. Cheers, Willy

