Re: [Gluster-users] Gluster Volume hangs (version 3.2.5)

Alessio Checcucci Thu, 15 Mar 2012 23:19:12 -0700

Dear Mohit,
I have performed some extensive tests deliberately I/O overloading the 
filesystem and I was able to reproduce the problem even not using mongodb. I 
drilled down into the IB fabric and I detected some strange switch behaviour, 
as you suggested. I am carrying on the investigation on this side and I will 
report as soon as I will have been able to shed some light on.


Thanks a lot
Alessio 

On 15/03/2012, at 22:54 , Mohit Anchlia wrote:

> Can you break your CSV in small chunks and try? It appears that network is 
> somehow getting overwhelmed. Have you checked switches for any errors?
> 
> On Wed, Mar 14, 2012 at 11:39 PM, Alessio Checcucci 
> <[email protected]> wrote:
> Dear Mohit,
> thanks for your answer. The setup is pretty new, we have configured it one 
> month ago more or less and the iozone tests we performed never highlighted 
> any problem.
> The servers are SGI machines based on Supermicro hardware, each one features: 
> 2 Xeon X5650 6-cores cpus
> 96GB of RAM 
> two Intel Gigabit interfaces 
> 1 Mellanox ConnectX-2 IB HCA
> 1 LSA 1068E SATA RAID controller
> 6 Seagate ST32000644NS 2TB HDDs
> 
> The Gluster nodes work quite smoothly, they act both as bricks and as 
> clients, mounting the Gluster filesystem by means of the fuse driver. 
> Unfortunately when we run the mongo import (from a huge CSV file) after some 
> time (minutes) all the mounts become completely freezed and the fuse error 
> (with related timeout) I reported in my first message is logged. 
> Looking at the volume log in the Gluster bricks we can see the following 
> messages:
> 
> [2012-03-15 04:45:07.352455] E 
> [rdma.c:3415:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send 
> work request on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, 
> post->buf = 0x2b08000, wc.byte_len = 0, post->reused = 2
> [2012-03-15 04:45:07.352510] E 
> [rdma.c:3423:rdma_handle_failed_send_completion] 0-rdma: connection between 
> client and server not working. check by running 'ibv_srq_pingpong'. also make 
> sure subnet manager is running (eg: 'opensm'), or check if rdma port is valid 
> (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the 
> problem persists.
> [2012-03-15 04:45:07.352535] E 
> [rdma.c:3415:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send 
> work request on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, 
> post->buf = 0x2b0a000, wc.byte_len = 0, post->reused = 5
> [2012-03-15 04:45:07.352545] E 
> [rdma.c:3423:rdma_handle_failed_send_completion] 0-rdma: connection between 
> client and server not working. check by running 'ibv_srq_pingpong'. also make 
> sure subnet manager is running (eg: 'opensm'), or check if rdma port is valid 
> (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the 
> problem persists.
> [2012-03-15 04:45:07.352900] E [rpc-clnt.c:341:saved_frames_unwind] 
> (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) 
> [0x7fda3f424568] 
> (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
>  [0x7fda3f423cfd] 
> (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) 
> [0x7fda3f423c5e]))) 0-HPC_data-client-4: forced unwinding frame 
> type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-03-15 04:45:03.336837
> [2012-03-15 04:45:07.352942] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 
> 0-glusterfs: remote operation failed: Transport endpoint is not connected
> [2012-03-15 04:45:07.352956] I [client.c:1883:client_rpc_notify] 
> 0-HPC_data-client-4: disconnected
> [2012-03-15 04:45:07.353301] E [rpc-clnt.c:341:saved_frames_unwind] 
> (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) 
> [0x7fda3f424568] 
> (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
>  [0x7fda3f423cfd] 
> (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) 
> [0x7fda3f423c5e]))) 0-HPC_data-client-5: forced unwinding frame 
> type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-03-15 04:45:03.336880
> [2012-03-15 04:45:07.353317] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 
> 0-glusterfs: remote operation failed: Transport endpoint is not connected
> [2012-03-15 04:45:07.353326] I [dht-layout.c:581:dht_layout_normalize] 
> 0-HPC_data-dht: found anomalies in /. holes=1 overlaps=0
> [2012-03-15 04:45:07.353335] I [dht-selfheal.c:569:dht_selfheal_directory] 
> 0-HPC_data-dht: 2 subvolumes down -- not fixing
> [2012-03-15 04:45:07.353365] I [client.c:1883:client_rpc_notify] 
> 0-HPC_data-client-5: disconnected
> [2012-03-15 04:45:17.703676] I 
> [client-handshake.c:1090:select_server_supported_programs] 
> 0-HPC_data-client-4: Using Program GlusterFS 3.2.5, Num (1298437), Version 
> (310)
> [2012-03-15 04:45:17.703857] I [client-handshake.c:913:client_setvolume_cbk] 
> 0-HPC_data-client-4: Connected to 192.168.100.165:24009, attached to remote 
> volume '/data'.
> [2012-03-15 04:45:17.706408] I 
> [client-handshake.c:1090:select_server_supported_programs] 
> 0-HPC_data-client-5: Using Program GlusterFS 3.2.5, Num (1298437), Version 
> (310)
> [2012-03-15 04:45:17.706566] I [client-handshake.c:913:client_setvolume_cbk] 
> 0-HPC_data-client-5: Connected to 192.168.100.166:24009, attached to remote 
> volume '/data'.
> [2012-03-15 06:28:09.624927] I [dht-layout.c:581:dht_layout_normalize] 
> 0-HPC_data-dht: found anomalies in /database/mongo/hipass_fixed/journal. 
> holes=1 overlaps=0
> [2012-03-15 06:28:09.704031] I [dht-layout.c:581:dht_layout_normalize] 
> 0-HPC_data-dht: found anomalies in /database/mongo/hipass_fixed/_tmp. holes=1 
> overlaps=0
> 
> We checked the Infiniband infrastucture and it is still working, hence we 
> suppose that the problem should stay somewhere else.
> 
> Thanks a lot for your help,
> Alessio
> 
> 
> On 15/03/2012, at 12:32 , Mohit Anchlia wrote:
> 
>> Is this a new setup and used to work before? How is the CPU, memory etc? 
>> Also, what do you see in gluster nodes?
>> 
>> On Wed, Mar 14, 2012 at 7:33 PM, Alessio Checcucci 
>> <[email protected]> wrote:
>> Dear All,
>> we are facing a problem in our computer room, we have 6 servers that act 
>> like bricks for GlusterFS, the servers are configured in the following way:
>> 
>> OS: Centos 6.2 x86_64
>> Kernel: 2.6.32-220.4.2.el6.x86_64
>> 
>> Gluster RPM packages:
>> glusterfs-core-3.2.5-2.el6.x86_64
>> glusterfs-rdma-3.2.5-2.el6.x86_64
>> glusterfs-geo-replication-3.2.5-2.el6.x86_64
>> glusterfs-fuse-3.2.5-2.el6.x86_64
>> 
>> Each one is contributing a XFS filesystem to the global volume, the 
>> transport mechanism is RDMA:
>> 
>> gluster volume create HPC_data transport rdma pleiades01:/data 
>> pleiades02:/data pleiades03:/data pleiades04:/data pleiades05:/data 
>> pleiades06:/data
>> 
>> Each server mounts, using the fuse driver, the volume on a dedicated mount 
>> point according to the following fstab:
>> 
>> pleiades01:/HPC_data        /HPCdata                glusterfs 
>> defaults,_netdev 0 0
>> 
>> We are running mongodb on top of the Gluster volume for performance testing 
>> and speed is definitely high. Unfortunately when we run a large mongoimport 
>> job after short time from the beginning the GlusterFS volume hangs 
>> completely and is inaccessible from any node. The following error is logged 
>> after some time in /var/log/messages:
>> 
>> Mar  8 08:16:03 pleiades03 kernel: INFO: task mongod:5508 blocked for more 
>> than 120 seconds.
>> Mar  8 08:16:03 pleiades03 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar  8 08:16:03 pleiades03 kernel: mongod        D 0000000000000007     0  
>> 5508      1 0x00000000
>> Mar  8 08:16:03 pleiades03 kernel: ffff881709b95de8 0000000000000086 
>> 0000000000000000 0000000000000008
>> Mar  8 08:16:03 pleiades03 kernel: ffff881709b95d68 ffffffff81090a7f 
>> ffff8816b6974cc0 0000000000000000
>> Mar  8 08:16:03 pleiades03 kernel: ffff8817fdd81af8 ffff881709b95fd8 
>> 000000000000f4e8 ffff8817fdd81af8
>> Mar  8 08:16:03 pleiades03 kernel: Call Trace:
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090a7f>] ? 
>> wake_up_bit+0x2f/0x40
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090d7e>] ? 
>> prepare_to_wait+0x4e/0x80
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112c6b5>] 
>> fuse_set_nowrite+0xa5/0xe0 [fuse]
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090a90>] ? 
>> autoremove_wake_function+0x0/0x40
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112fd48>] 
>> fuse_fsync_common+0xa8/0x180 [fuse]
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112fe30>] fuse_fsync+0x10/0x20 
>> [fuse]
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff811a52d1>] 
>> vfs_fsync_range+0xa1/0xe0
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff811a537d>] vfs_fsync+0x1d/0x20
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81144421>] sys_msync+0x151/0x1e0
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff8100b0f2>] 
>> system_call_fastpath+0x16/0x1b
>> 
>> Any attempt to access the volume from any node is fruitless until the 
>> mongodb process is killed, the sessions accessing the /HPCdata path gets 
>> freezed on any node. 
>> Anyway a complete stop (force) and start of the volume is needed to have it 
>> back operational.
>> The situation can be reproduced at will.
>> Is there anybody able to help us? Could we collect more pieces of 
>> information to help diagnosing the problem?
>> 
>> Thanks a lot
>> Alessio 
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> 
>> 
> 
>

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Volume hangs (version 3.2.5)

Reply via email to