Dear Mohit, I have performed some extensive tests deliberately I/O overloading the filesystem and I was able to reproduce the problem even not using mongodb. I drilled down into the IB fabric and I detected some strange switch behaviour, as you suggested. I am carrying on the investigation on this side and I will report as soon as I will have been able to shed some light on.
Thanks a lot Alessio On 15/03/2012, at 22:54 , Mohit Anchlia wrote: > Can you break your CSV in small chunks and try? It appears that network is > somehow getting overwhelmed. Have you checked switches for any errors? > > On Wed, Mar 14, 2012 at 11:39 PM, Alessio Checcucci > <[email protected]> wrote: > Dear Mohit, > thanks for your answer. The setup is pretty new, we have configured it one > month ago more or less and the iozone tests we performed never highlighted > any problem. > The servers are SGI machines based on Supermicro hardware, each one features: > 2 Xeon X5650 6-cores cpus > 96GB of RAM > two Intel Gigabit interfaces > 1 Mellanox ConnectX-2 IB HCA > 1 LSA 1068E SATA RAID controller > 6 Seagate ST32000644NS 2TB HDDs > > The Gluster nodes work quite smoothly, they act both as bricks and as > clients, mounting the Gluster filesystem by means of the fuse driver. > Unfortunately when we run the mongo import (from a huge CSV file) after some > time (minutes) all the mounts become completely freezed and the fuse error > (with related timeout) I reported in my first message is logged. > Looking at the volume log in the Gluster bricks we can see the following > messages: > > [2012-03-15 04:45:07.352455] E > [rdma.c:3415:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send > work request on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, > post->buf = 0x2b08000, wc.byte_len = 0, post->reused = 2 > [2012-03-15 04:45:07.352510] E > [rdma.c:3423:rdma_handle_failed_send_completion] 0-rdma: connection between > client and server not working. check by running 'ibv_srq_pingpong'. also make > sure subnet manager is running (eg: 'opensm'), or check if rdma port is valid > (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the > problem persists. > [2012-03-15 04:45:07.352535] E > [rdma.c:3415:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send > work request on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, > post->buf = 0x2b0a000, wc.byte_len = 0, post->reused = 5 > [2012-03-15 04:45:07.352545] E > [rdma.c:3423:rdma_handle_failed_send_completion] 0-rdma: connection between > client and server not working. check by running 'ibv_srq_pingpong'. also make > sure subnet manager is running (eg: 'opensm'), or check if rdma port is valid > (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the > problem persists. > [2012-03-15 04:45:07.352900] E [rpc-clnt.c:341:saved_frames_unwind] > (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) > [0x7fda3f424568] > (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) > [0x7fda3f423cfd] > (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) > [0x7fda3f423c5e]))) 0-HPC_data-client-4: forced unwinding frame > type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-03-15 04:45:03.336837 > [2012-03-15 04:45:07.352942] E [client3_1-fops.c:2228:client3_1_lookup_cbk] > 0-glusterfs: remote operation failed: Transport endpoint is not connected > [2012-03-15 04:45:07.352956] I [client.c:1883:client_rpc_notify] > 0-HPC_data-client-4: disconnected > [2012-03-15 04:45:07.353301] E [rpc-clnt.c:341:saved_frames_unwind] > (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) > [0x7fda3f424568] > (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) > [0x7fda3f423cfd] > (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) > [0x7fda3f423c5e]))) 0-HPC_data-client-5: forced unwinding frame > type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-03-15 04:45:03.336880 > [2012-03-15 04:45:07.353317] E [client3_1-fops.c:2228:client3_1_lookup_cbk] > 0-glusterfs: remote operation failed: Transport endpoint is not connected > [2012-03-15 04:45:07.353326] I [dht-layout.c:581:dht_layout_normalize] > 0-HPC_data-dht: found anomalies in /. holes=1 overlaps=0 > [2012-03-15 04:45:07.353335] I [dht-selfheal.c:569:dht_selfheal_directory] > 0-HPC_data-dht: 2 subvolumes down -- not fixing > [2012-03-15 04:45:07.353365] I [client.c:1883:client_rpc_notify] > 0-HPC_data-client-5: disconnected > [2012-03-15 04:45:17.703676] I > [client-handshake.c:1090:select_server_supported_programs] > 0-HPC_data-client-4: Using Program GlusterFS 3.2.5, Num (1298437), Version > (310) > [2012-03-15 04:45:17.703857] I [client-handshake.c:913:client_setvolume_cbk] > 0-HPC_data-client-4: Connected to 192.168.100.165:24009, attached to remote > volume '/data'. > [2012-03-15 04:45:17.706408] I > [client-handshake.c:1090:select_server_supported_programs] > 0-HPC_data-client-5: Using Program GlusterFS 3.2.5, Num (1298437), Version > (310) > [2012-03-15 04:45:17.706566] I [client-handshake.c:913:client_setvolume_cbk] > 0-HPC_data-client-5: Connected to 192.168.100.166:24009, attached to remote > volume '/data'. > [2012-03-15 06:28:09.624927] I [dht-layout.c:581:dht_layout_normalize] > 0-HPC_data-dht: found anomalies in /database/mongo/hipass_fixed/journal. > holes=1 overlaps=0 > [2012-03-15 06:28:09.704031] I [dht-layout.c:581:dht_layout_normalize] > 0-HPC_data-dht: found anomalies in /database/mongo/hipass_fixed/_tmp. holes=1 > overlaps=0 > > We checked the Infiniband infrastucture and it is still working, hence we > suppose that the problem should stay somewhere else. > > Thanks a lot for your help, > Alessio > > > On 15/03/2012, at 12:32 , Mohit Anchlia wrote: > >> Is this a new setup and used to work before? How is the CPU, memory etc? >> Also, what do you see in gluster nodes? >> >> On Wed, Mar 14, 2012 at 7:33 PM, Alessio Checcucci >> <[email protected]> wrote: >> Dear All, >> we are facing a problem in our computer room, we have 6 servers that act >> like bricks for GlusterFS, the servers are configured in the following way: >> >> OS: Centos 6.2 x86_64 >> Kernel: 2.6.32-220.4.2.el6.x86_64 >> >> Gluster RPM packages: >> glusterfs-core-3.2.5-2.el6.x86_64 >> glusterfs-rdma-3.2.5-2.el6.x86_64 >> glusterfs-geo-replication-3.2.5-2.el6.x86_64 >> glusterfs-fuse-3.2.5-2.el6.x86_64 >> >> Each one is contributing a XFS filesystem to the global volume, the >> transport mechanism is RDMA: >> >> gluster volume create HPC_data transport rdma pleiades01:/data >> pleiades02:/data pleiades03:/data pleiades04:/data pleiades05:/data >> pleiades06:/data >> >> Each server mounts, using the fuse driver, the volume on a dedicated mount >> point according to the following fstab: >> >> pleiades01:/HPC_data /HPCdata glusterfs >> defaults,_netdev 0 0 >> >> We are running mongodb on top of the Gluster volume for performance testing >> and speed is definitely high. Unfortunately when we run a large mongoimport >> job after short time from the beginning the GlusterFS volume hangs >> completely and is inaccessible from any node. The following error is logged >> after some time in /var/log/messages: >> >> Mar 8 08:16:03 pleiades03 kernel: INFO: task mongod:5508 blocked for more >> than 120 seconds. >> Mar 8 08:16:03 pleiades03 kernel: "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Mar 8 08:16:03 pleiades03 kernel: mongod D 0000000000000007 0 >> 5508 1 0x00000000 >> Mar 8 08:16:03 pleiades03 kernel: ffff881709b95de8 0000000000000086 >> 0000000000000000 0000000000000008 >> Mar 8 08:16:03 pleiades03 kernel: ffff881709b95d68 ffffffff81090a7f >> ffff8816b6974cc0 0000000000000000 >> Mar 8 08:16:03 pleiades03 kernel: ffff8817fdd81af8 ffff881709b95fd8 >> 000000000000f4e8 ffff8817fdd81af8 >> Mar 8 08:16:03 pleiades03 kernel: Call Trace: >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffff81090a7f>] ? >> wake_up_bit+0x2f/0x40 >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffff81090d7e>] ? >> prepare_to_wait+0x4e/0x80 >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffffa112c6b5>] >> fuse_set_nowrite+0xa5/0xe0 [fuse] >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffff81090a90>] ? >> autoremove_wake_function+0x0/0x40 >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffffa112fd48>] >> fuse_fsync_common+0xa8/0x180 [fuse] >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffffa112fe30>] fuse_fsync+0x10/0x20 >> [fuse] >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffff811a52d1>] >> vfs_fsync_range+0xa1/0xe0 >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffff811a537d>] vfs_fsync+0x1d/0x20 >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffff81144421>] sys_msync+0x151/0x1e0 >> Mar 8 08:16:03 pleiades03 kernel: [<ffffffff8100b0f2>] >> system_call_fastpath+0x16/0x1b >> >> Any attempt to access the volume from any node is fruitless until the >> mongodb process is killed, the sessions accessing the /HPCdata path gets >> freezed on any node. >> Anyway a complete stop (force) and start of the volume is needed to have it >> back operational. >> The situation can be reproduced at will. >> Is there anybody able to help us? Could we collect more pieces of >> information to help diagnosing the problem? >> >> Thanks a lot >> Alessio >> >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >> > >
_______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
