Hi, sorry for topposting, but Outlook is notoriously bad at inlining..
In reply to Willy Tarreau:

Our sysctl settings are in this pastebin (it's a mix of sysctl.conf settings, 
and some commandes called in a custom iptables script): 
http://pastebin.com/kZDP8XuM
Have you got any recommendations regarding them?


When the error occurred we had a "normal" load, but as this normal load does 
involve loads of heavy (CPU and DB I/O from a remote SQL server) threads in a 
largeish Java VM, I can't really tell if that might have been a contributing 
factor.

Our MRTG graphs for memory, CPU and load at the time shows nothing unexpected, 
so I guess I'll report it to RedHat and see if they can give me any pointers or 
a fix.

Thanks!

Regards,
Jens Dueholm Christensen 
Survey IT

-----Original Message-----
From: Willy Tarreau [mailto:[email protected]] 
Sent: Thursday, September 12, 2013 10:21 PM
To: Lukas Tribus
Cc: Jens Dueholm Christensen; [email protected]
Subject: Re: Page allocation failure

On Thu, Sep 12, 2013 at 03:49:33PM +0200, Lukas Tribus wrote:
> Hi!
> 
> 
> > A few days ago one of our machines logged this:
> > 
> > 
> > 
> > Sep 10 10:54:29 web8 kernel: haproxy: page allocation failure. 
> > order:1,
> > mode:0x20
> 
> The kernel has problems allocating memory to haproxy. Since we don't 
> see the OOM killer in action, I guess your memory is heavily fragmented.

It is possible the problem is even deeper. haproxy called connect(), which 
happened to enable irqs and was suddenly interrupted by a softirq pertaining to 
a pending incoming ACK packet completing a pending incoming connection.
The TCP receive path was called from the softirq to create a real connection 
from a connection request, and at this exact point failed a memory allocation 
to receive a packet.

> I guess this box has a long uptime?
> 
> How much free RAM does "free -m" show?

Also please check your sysctl.conf in case you would have changed some of them 
based on the advices from random sites (we often find wrong settings causing 
4096 times too much RAM being allocated to the network stack).

And you should check if you were not running some backups or anything I/O 
intensive at the same time, as it could fill the RAM with a lot of cached data 
if the tuning is not that good.

If you didn't do it and still have some RAM, I'd suggest reporting that to Red 
Hat who may be interested in investigating this issue as it could be very 
specific to their kernel.

> > Should I be worried?
> 
> Memory allocation failure will lead to application failures. I would 
> take this seriously.

In this case it did not hit haproxy (otherwise it would have been killed).
However one incoming connection was destroyed and we don't know why. So yes it 
could be a very serious issue.

> > An upgrade to 1.4.24 is planned Real Soon(TM), but I am unsure if 
> > it?s a known error that?s fixed in a later version.
> 
> Upgrading to 1.4.24 is important, there are several issues with 1.4.22.
> However, it will not fix this problem, as this is not a bug in haproxy.
> The problem mostly depends on your kernel.

Clearly.

> A quick fix is to reboot the box, which will make the problem go away 
> for now.
> 
> I suggest upgrading OS/kernel to a more recent version.

I believe that 2.6.32-358 is a reasonably recent one, though I may be wrong.

Cheers,
Willy


Reply via email to