Hi Jens,

On Mon, Oct 07, 2013 at 11:23:38AM +0000, Jens Dueholm Christensen wrote:
> A bit of followup (again sorry for topposting)..
> 
> I began looking around for the first line in the stacktrace
> __alloc_pages_nodemask+0x757/0x8d0
> and found a thread in the Linux-Kernel mailinglist:
> http://lkml.indiana.edu/hypermail/linux/kernel/1305.3/01761.html.

Good catch!

> This thread left me wondering a bit, and since I'm not a C[++]-programmer,
> I've never dealt with handling memory allocation and the intricies it
> involves, so I began wondering what and how the order was affecting memory
> allocation..

The order indicates the log of the number of consecutive pages the system
tried to allocate at once. So order 0 is 4kB, order 1 is 8kB, order 2 is
16kB etc...

> All the errors I've seen in our logs are logged as order-1 failures, and as
> far as I can understand an order-1 allocation error is not necessarily a dead
> end.

That said it's rare. It could mean that your memory is highly fragmented
due to other workloads on the same machine.

> According to
> https://www.kernel.org/doc/gorman/html/understand/understand009.html there
> should be a fallback to a lower-order allocation when a higher-order
> allocation is requested and fails. 

That's indeed sometimes possible (I don't know the exact conditions for
this to happen). That said, the worst that can happen in your case is
that some outgoing packets are dropped and will have to be retransmitted.
If it happens once in a while it might be OK, but it's not pleasant to
have a system log full of errors. Did you try to report the issue to the
distro ? It's important to report bugs, as bugs not reported to not exist.

> According to the same Linux-Kernel thread some networkdrivers are buggy, and
> since this machine contains 6 Broadcom NetXtreme II BCM5709 1000Base-T
> interfaces I am currently looking into upgrading the driver.

Are you running with jumbo frames or is your MTU set to 1500 ? I used to
experience allocation failures with jumbo frames in the past on the e1000
driver, it could be a similar issue here. Maybe your driver allocates send
buffers by large chunks, I don't know.

Regards,
Willy


Reply via email to