Looks like some process hung there due to memory issues in kernel, error 
message from the very beginning would be helpful


Sent from my iPhone

On 2012-6-9, at 上午8:26, Ling Ho <[email protected]> wrote:

> Hi Anand,
> 
> ulimit -l running as root is 64.
> 
> 
> This dmesg out is from the second system.
> 
> I don't see any new on the first system other that what were there when 
> system booted.
> Do you want to see the whole dmesg output? Where should I post it, there are 
> 1600 lines. 
> 
> ...
> ling
> 
> INFO: task glusterfs:8880 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfs     D 0000000000000000     0  8880      1 0x00000080
>  ffff880614b75e48 0000000000000086 0000000000000000 ffff88010ed65d80
>  000000000000038b 000000000000038b ffff880614b75ee8 ffffffff814ef8f5
>  ffff88062bc4ba78 ffff880614b75fd8 000000000000f4e8 ffff88062bc4ba78
> Call Trace:
>  [<ffffffff814ef8f5>] ? page_fault+0x25/0x30
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffff81141768>] sys_munmap+0x48/0x80
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfs:8880 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfs     D 0000000000000000     0  8880      1 0x00000080
>  ffff880614b75e48 0000000000000086 0000000000000000 ffff88010ed65d80
>  000000000000038b 000000000000038b ffff880614b75ee8 ffffffff814ef8f5
>  ffff88062bc4ba78 ffff880614b75fd8 000000000000f4e8 ffff88062bc4ba78
> Call Trace:
>  [<ffffffff814ef8f5>] ? page_fault+0x25/0x30
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffff81141768>] sys_munmap+0x48/0x80
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfs:8880 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfs     D 0000000000000009     0  8880      1 0x00000080
>  ffff880614b75e08 0000000000000086 0000000000000000 ffff88062d638338
>  ffff880c30ef88c0 ffffffff8120d34f ffff880614b75d98 ffff88061406f740
>  ffff88062bc4ba78 ffff880614b75fd8 000000000000f4e8 ffff88062bc4ba78
> Call Trace:
>  [<ffffffff8120d34f>] ? security_inode_permission+0x1f/0x30
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffff81131ddc>] sys_mmap_pgoff+0x5c/0x2d0
>  [<ffffffff81010469>] sys_mmap+0x29/0x30
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfs:8880 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfs     D 0000000000000009     0  8880      1 0x00000080
>  ffff880614b75e08 0000000000000086 0000000000000000 ffff88062d638338
>  ffff880c30ef88c0 ffffffff8120d34f ffff880614b75d98 ffff88061406f740
>  ffff88062bc4ba78 ffff880614b75fd8 000000000000f4e8 ffff88062bc4ba78
> Call Trace:
>  [<ffffffff8120d34f>] ? security_inode_permission+0x1f/0x30
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffff81131ddc>] sys_mmap_pgoff+0x5c/0x2d0
>  [<ffffffff81010469>] sys_mmap+0x29/0x30
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfs:8880 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfs     D 0000000000000003     0  8880      1 0x00000080
>  ffff880614b75e08 0000000000000086 0000000000000000 ffff880630ab1ab8
>  ffff880c30ef88c0 ffffffff8120d34f ffff880614b75d98 ffff88062df10480
>  ffff88062bc4ba78 ffff880614b75fd8 000000000000f4e8 ffff88062bc4ba78
> Call Trace:
>  [<ffffffff8120d34f>] ? security_inode_permission+0x1f/0x30
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffff81131ddc>] sys_mmap_pgoff+0x5c/0x2d0
>  [<ffffffff81010469>] sys_mmap+0x29/0x30
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfsd:9471 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfsd    D 0000000000000004     0  9471      1 0x00000080
>  ffff8801077c3740 0000000000000082 0000000000000000 ffff8801077c36b8
>  ffffffff8127f138 0000000000000000 0000000000000000 ffff8801077c36d8
>  ffff8806146f4638 ffff8801077c3fd8 000000000000f4e8 ffff8806146f4638
> Call Trace:
>  [<ffffffff8127f138>] ? swiotlb_dma_mapping_error+0x18/0x30
>  [<ffffffff8127f138>] ? swiotlb_dma_mapping_error+0x18/0x30
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffffa019607a>] ? ixgbe_xmit_frame_ring+0x93a/0xfc0 [ixgbe]
>  [<ffffffff814ef1f6>] rwsem_down_read_failed+0x26/0x30
>  [<ffffffff81276e84>] call_rwsem_down_read_failed+0x14/0x30
>  [<ffffffff814ee6f4>] ? down_read+0x24/0x30
>  [<ffffffff81042bc7>] __do_page_fault+0x187/0x480
>  [<ffffffff81430c38>] ? dev_queue_xmit+0x178/0x6b0
>  [<ffffffff8146809c>] ? ip_finish_output+0x13c/0x310
>  [<ffffffff814f253e>] do_page_fault+0x3e/0xa0
>  [<ffffffff814ef8f5>] page_fault+0x25/0x30
>  [<ffffffff81275a6d>] ? copy_user_generic_string+0x2d/0x40
>  [<ffffffff81425655>] ? memcpy_toiovec+0x55/0x80
>  [<ffffffff81426070>] skb_copy_datagram_iovec+0x60/0x2c0
>  [<ffffffff8141ceac>] ? lock_sock_nested+0xac/0xc0
>  [<ffffffff814ef5cb>] ? _spin_unlock_bh+0x1b/0x20
>  [<ffffffff814722d5>] tcp_recvmsg+0xca5/0xe90
>  [<ffffffff814925ea>] inet_recvmsg+0x5a/0x90
>  [<ffffffff8141bff1>] sock_aio_read+0x181/0x190
>  [<ffffffff810566a3>] ? perf_event_task_sched_out+0x33/0x80
>  [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
>  [<ffffffff8141be70>] ? sock_aio_read+0x0/0x190
>  [<ffffffff8117614b>] do_sync_readv_writev+0xfb/0x140
>  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
>  [<ffffffff8120c1e6>] ? security_file_permission+0x16/0x20
>  [<ffffffff811771df>] do_readv_writev+0xcf/0x1f0
>  [<ffffffff811b9b50>] ? sys_epoll_wait+0xa0/0x300
>  [<ffffffff814ecb0e>] ? thread_return+0x4e/0x760
>  [<ffffffff81177513>] vfs_readv+0x43/0x60
>  [<ffffffff81177641>] sys_readv+0x51/0xb0
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfsd:9545 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfsd    D 0000000000000006     0  9545      1 0x00000080
>  ffff880c24a7bcf8 0000000000000082 0000000000000000 ffffffff8107c0a0
>  ffff88066a0a7580 ffff880c30460000 0000000000000000 0000000000000000
>  ffff88066a0a7b38 ffff880c24a7bfd8 000000000000f4e8 ffff88066a0a7b38
> Call Trace:
>  [<ffffffff8107c0a0>] ? process_timeout+0x0/0x10
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff8127f18c>] ? is_swiotlb_buffer+0x3c/0x50
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffffa0211b96>] ib_umem_release+0x76/0x110 [ib_core]
>  [<ffffffffa0230d52>] mlx4_ib_dereg_mr+0x32/0x50 [mlx4_ib]
>  [<ffffffffa020cd85>] ib_dereg_mr+0x35/0x50 [ib_core]
>  [<ffffffffa041bc5b>] ib_uverbs_dereg_mr+0x7b/0xf0 [ib_uverbs]
>  [<ffffffffa04194ef>] ib_uverbs_write+0xbf/0xe0 [ib_uverbs]
>  [<ffffffff8117646d>] ? rw_verify_area+0x5d/0xc0
>  [<ffffffff81176588>] vfs_write+0xb8/0x1a0
>  [<ffffffff810d4692>] ? audit_syscall_entry+0x272/0x2a0
>  [<ffffffff81176f91>] sys_write+0x51/0x90
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfsd:9546 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfsd    D 0000000000000004     0  9546      1 0x00000080
>  ffff880c0634bcf0 0000000000000082 ffff880c0634bcb8 ffff880c0634bcb4
>  0000000000015f80 ffff88063fc24b00 ffff880655495f80 0000000000000400
>  ffff880c2dccc5f8 ffff880c0634bfd8 000000000000f4e8 ffff880c2dccc5f8
> Call Trace:
>  [<ffffffff810566a3>] ? perf_event_task_sched_out+0x33/0x80
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
>  [<ffffffff814ef1f6>] rwsem_down_read_failed+0x26/0x30
>  [<ffffffff814ecb0e>] ? thread_return+0x4e/0x760
>  [<ffffffff81276e84>] call_rwsem_down_read_failed+0x14/0x30
>  [<ffffffff814ee6f4>] ? down_read+0x24/0x30
>  [<ffffffff81042bc7>] __do_page_fault+0x187/0x480
>  [<ffffffffa0419e16>] ? ib_uverbs_event_read+0x1d6/0x240 [ib_uverbs]
>  [<ffffffff814f253e>] do_page_fault+0x3e/0xa0
>  [<ffffffff814ef8f5>] page_fault+0x25/0x30
> INFO: task glusterfsd:9553 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfsd    D 000000000000000e     0  9553      1 0x00000080
>  ffff8806e131dd98 0000000000000082 0000000000000000 ffff8806e131dd64
>  ffff8806e131dd48 ffffffffa026dfb6 ffff8806e131dd28 ffffffff00000000
>  ffff880c2f41c678 ffff8806e131dfd8 000000000000f4e8 ffff880c2f41c678
> Call Trace:
>  [<ffffffffa026dfb6>] ? xfs_attr_get+0xb6/0xc0 [xfs]
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffff81136009>] sys_madvise+0x329/0x760
>  [<ffffffff81195740>] ? mntput_no_expire+0x30/0x110
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> INFO: task glusterfs:8880 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> glusterfs     D 0000000000000003     0  8880      1 0x00000080
>  ffff880614b75e08 0000000000000086 0000000000000000 ffff880630ab1ab8
>  ffff880c30ef88c0 ffffffff8120d34f ffff880614b75d98 ffff88062df10480
>  ffff88062bc4ba78 ffff880614b75fd8 000000000000f4e8 ffff88062bc4ba78
> Call Trace:
>  [<ffffffff8120d34f>] ? security_inode_permission+0x1f/0x30
>  [<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
>  [<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff814ee6c2>] ? down_write+0x32/0x40
>  [<ffffffff81131ddc>] sys_mmap_pgoff+0x5c/0x2d0
>  [<ffffffff81010469>] sys_mmap+0x29/0x30
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> 
> 
> On 06/08/2012 05:18 PM, Anand Avati wrote:
>> 
>> Those are 4.x GB. Can you post dmesg output as well? Also, what's 'ulimit 
>> -l' on your system?
>> 
>> On Fri, Jun 8, 2012 at 4:41 PM, Ling Ho <[email protected]> wrote:
>> 
>> This is the core file from the crash just now
>> 
>> [root@psanaoss213 /]# ls -al core*
>> -rw------- 1 root root 4073594880 Jun  8 15:05 core.22682
>> 
>> From yesterday:
>> [root@psanaoss214 /]# ls -al core*
>> -rw------- 1 root root 4362727424 Jun  8 00:58 core.13483
>> -rw------- 1 root root 4624773120 Jun  8 03:21 core.8792
>> 
>> 
>> 
>> On 06/08/2012 04:34 PM, Anand Avati wrote:
>>> 
>>> Is it possible the system was running low on memory? I see you have 48GB, 
>>> but memory registration failure typically would be because the system limit 
>>> on the number of pinnable pages in RAM was hit. Can you tell us the size of 
>>> your core dump files after the crash?
>>> 
>>> Avati
>>> 
>>> On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <[email protected]> wrote:
>>> Hello,
>>> 
>>> I have a brick that crashed twice today, and another different brick that 
>>> crashed just a while a go.
>>> 
>>> This is what I see in one of the brick logs:
>>> 
>>> patchset: git://git.gluster.com/glusterfs.git
>>> patchset: git://git.gluster.com/glusterfs.git
>>> signal received: 6
>>> signal received: 6
>>> time of crash: 2012-06-08 15:05:11
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> fdatasync 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 3.2.6
>>> /lib64/libc.so.6[0x34bc032900]
>>> /lib64/libc.so.6(gsignal+0x35)[0x34bc032885]
>>> /lib64/libc.so.6(abort+0x175)[0x34bc034065]
>>> /lib64/libc.so.6[0x34bc06f977]
>>> /lib64/libc.so.6[0x34bc075296]
>>> /opt/glusterfs/3.2.6/lib64/libglusterfs.so.0(__gf_free+0x44)[0x7f1740ba25e4]
>>> /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_destroy+0x47)[0x7f1740956967]
>>> /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_unref+0x62)[0x7f1740956a32]
>>> /opt/glusterfs/3.2.6/lib64/glusterfs/3.2.6/rpc-transport/rdma.so(+0xc135)[0x7f173ca27135]
>>> /lib64/libpthread.so.0[0x34bc8077f1]
>>> /lib64/libc.so.6(clone+0x6d)[0x34bc0e5ccd]
>>> ---------
>>> 
>>> And somewhere before these, there is also
>>> [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post] 
>>> 0-rpc-transport/rdma: memory registration failed
>>> 
>>> I have 48GB of memory on the system:
>>> 
>>> # free
>>>             total       used       free                             shared  
>>>   buffers     cached
>>> Mem:      49416716   34496648   14920068          0      31692   28209612
>>> -/+ buffers/cache:    6255344   43161372
>>> Swap:      4194296       1740 4192556
>>> 
>>> # uname -a
>>> Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST 
>>> 2012 x86_64 x86_64 x86_64 GNU/Linux
>>> 
>>> The server gluster versions is 3.2.6-1. I am using have both rdma clients 
>>> and tcp clients over 10Gb/s network.
>>> 
>>> Any suggestion what I should look for?
>>> 
>>> Is there a way to just restart the brick, and not glusterd on the server? I 
>>> have 8 bricks on the server.
>>> 
>>> Thanks,
>>> ...
>>> ling
>>> 
>>> 
>>> Here's the volume info:
>>> 
>>> # gluster volume info
>>> 
>>> Volume Name: ana12
>>> Type: Distribute
>>> Status: Started
>>> Number of Bricks: 40
>>> Transport-type: tcp,rdma
>>> Bricks:
>>> Brick1: psanaoss214:/brick1
>>> Brick2: psanaoss214:/brick2
>>> Brick3: psanaoss214:/brick3
>>> Brick4: psanaoss214:/brick4
>>> Brick5: psanaoss214:/brick5
>>> Brick6: psanaoss214:/brick6
>>> Brick7: psanaoss214:/brick7
>>> Brick8: psanaoss214:/brick8
>>> Brick9: psanaoss211:/brick1
>>> Brick10: psanaoss211:/brick2
>>> Brick11: psanaoss211:/brick3
>>> Brick12: psanaoss211:/brick4
>>> Brick13: psanaoss211:/brick5
>>> Brick14: psanaoss211:/brick6
>>> Brick15: psanaoss211:/brick7
>>> Brick16: psanaoss211:/brick8
>>> Brick17: psanaoss212:/brick1
>>> Brick18: psanaoss212:/brick2
>>> Brick19: psanaoss212:/brick3
>>> Brick20: psanaoss212:/brick4
>>> Brick21: psanaoss212:/brick5
>>> Brick22: psanaoss212:/brick6
>>> Brick23: psanaoss212:/brick7
>>> Brick24: psanaoss212:/brick8
>>> Brick25: psanaoss213:/brick1
>>> Brick26: psanaoss213:/brick2
>>> Brick27: psanaoss213:/brick3
>>> Brick28: psanaoss213:/brick4
>>> Brick29: psanaoss213:/brick5
>>> Brick30: psanaoss213:/brick6
>>> Brick31: psanaoss213:/brick7
>>> Brick32: psanaoss213:/brick8
>>> Brick33: psanaoss215:/brick1
>>> Brick34: psanaoss215:/brick2
>>> Brick35: psanaoss215:/brick4
>>> Brick36: psanaoss215:/brick5
>>> Brick37: psanaoss215:/brick7
>>> Brick38: psanaoss215:/brick8
>>> Brick39: psanaoss215:/brick3
>>> Brick40: psanaoss215:/brick6
>>> Options Reconfigured:
>>> performance.io-thread-count: 16
>>> performance.write-behind-window-size: 16MB
>>> performance.cache-size: 1GB
>>> nfs.disable: on
>>> performance.cache-refresh-timeout: 1
>>> network.ping-timeout: 42
>>> performance.cache-max-file-size: 1PB
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> [email protected]
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>> 
>> 
>> 
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to