Pranith,
This issue continues to happen. If you could provide instructions for
getting you the statedump, I would be happy to send that information.
I am not sure how to get a statedump just before the crash as the crash
is intermittent.
David
------ Original Message ------
From: "Pranith Kumar Karampuri" <[email protected]>
To: "Glomski, Patrick" <[email protected]>;
[email protected]; [email protected]
Cc: "David Robinson" <[email protected]>
Sent: 12/21/2015 11:59:33 PM
Subject: Re: [Gluster-devel] glusterfsd crash due to page allocation
failure
hi Glomski,
This is the second time I am hearing about memory allocation
problems in 3.7.6 but this time on brick side. Are you able to recreate
this issue? Will it be possible to get statedumps of the bricks
processes just before they crash?
Pranith
On 12/22/2015 02:25 AM, Glomski, Patrick wrote:
Hello,
We've recently upgraded from gluster 3.6.6 to 3.7.6 and have started
encountering dmesg page allocation errors (stack trace is appended).
It appears that glusterfsd now sometimes fills up the cache completely
and crashes with a page allocation failure. I *believe* it mainly
happens when copying lots of new data to the system, running a 'find',
or similar. Hosts are all Scientific Linux 6.6 and these errors occur
consistently on two separate gluster pools.
Has anyone else seen this issue and are there any known fixes for it
via sysctl kernel parameters or other means?
Please let me know of any other diagnostic information that would
help.
Thanks,
Patrick
[1458118.134697] glusterfsd: page allocation failure. order:5,
mode:0x20
[1458118.134701] Pid: 6010, comm: glusterfsd Not tainted
2.6.32-573.3.1.el6.x86_64 #1
[1458118.134702] Call Trace:
[1458118.134714] [<ffffffff8113770c>] ?
__alloc_pages_nodemask+0x7dc/0x950
[1458118.134728] [<ffffffffa0321800>] ?
mlx4_ib_post_send+0x680/0x1f90 [mlx4_ib]
[1458118.134733] [<ffffffff81176e92>] ? kmem_getpages+0x62/0x170
[1458118.134735] [<ffffffff81177aaa>] ? fallback_alloc+0x1ba/0x270
[1458118.134736] [<ffffffff811774ff>] ? cache_grow+0x2cf/0x320
[1458118.134738] [<ffffffff81177829>] ?
____cache_alloc_node+0x99/0x160
[1458118.134743] [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
[1458118.134744] [<ffffffff81178479>] ? __kmalloc+0x199/0x230
[1458118.134746] [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
[1458118.134748] [<ffffffff8146001a>] ? __pskb_pull_tail+0x2aa/0x360
[1458118.134751] [<ffffffff8146f389>] ? harmonize_features+0x29/0x70
[1458118.134753] [<ffffffff8146f9f4>] ?
dev_hard_start_xmit+0x1c4/0x490
[1458118.134758] [<ffffffff8148cf8a>] ? sch_direct_xmit+0x15a/0x1c0
[1458118.134759] [<ffffffff8146ff68>] ? dev_queue_xmit+0x228/0x320
[1458118.134762] [<ffffffff8147665d>] ?
neigh_connected_output+0xbd/0x100
[1458118.134766] [<ffffffff814abc67>] ? ip_finish_output+0x287/0x360
[1458118.134767] [<ffffffff814abdf8>] ? ip_output+0xb8/0xc0
[1458118.134769] [<ffffffff814ab04f>] ? __ip_local_out+0x9f/0xb0
[1458118.134770] [<ffffffff814ab085>] ? ip_local_out+0x25/0x30
[1458118.134772] [<ffffffff814ab580>] ? ip_queue_xmit+0x190/0x420
[1458118.134773] [<ffffffff81137059>] ?
__alloc_pages_nodemask+0x129/0x950
[1458118.134776] [<ffffffff814c0c54>] ? tcp_transmit_skb+0x4b4/0x8b0
[1458118.134778] [<ffffffff814c319a>] ? tcp_write_xmit+0x1da/0xa90
[1458118.134779] [<ffffffff81178cbd>] ? __kmalloc_node+0x4d/0x60
[1458118.134780] [<ffffffff814c3a80>] ? tcp_push_one+0x30/0x40
[1458118.134782] [<ffffffff814b410c>] ? tcp_sendmsg+0x9cc/0xa20
[1458118.134786] [<ffffffff8145836b>] ? sock_aio_write+0x19b/0x1c0
[1458118.134788] [<ffffffff814581d0>] ? sock_aio_write+0x0/0x1c0
[1458118.134791] [<ffffffff8119169b>] ?
do_sync_readv_writev+0xfb/0x140
[1458118.134797] [<ffffffff810a14b0>] ?
autoremove_wake_function+0x0/0x40
[1458118.134801] [<ffffffff8123e92f>] ?
selinux_file_permission+0xbf/0x150
[1458118.134804] [<ffffffff812316d6>] ?
security_file_permission+0x16/0x20
[1458118.134806] [<ffffffff81192746>] ? do_readv_writev+0xd6/0x1f0
[1458118.134807] [<ffffffff811928a6>] ? vfs_writev+0x46/0x60
[1458118.134809] [<ffffffff811929d1>] ? sys_writev+0x51/0xd0
[1458118.134812] [<ffffffff810e88ae>] ?
__audit_syscall_exit+0x25e/0x290
[1458118.134816] [<ffffffff8100b0d2>] ?
system_call_fastpath+0x16/0x1b
_______________________________________________ Gluster-devel mailing
list
[email protected]http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-devel