Hi Jeremy,

Did you see any out of memory (especially related to glusterfs process) logs in 
dmesg? It would be very helpful if we can get the logs of all client and server 
processes (logs got at loglevel TRACE would be very helpful). Can you please 
send us logs?

regards,
----- Original Message -----
> From: "Jeremy Stout" <[email protected]>
> To: [email protected]
> Sent: Saturday, January 22, 2011 7:34:22 AM
> Subject: Re: [Gluster-users] 3.1.2 feedback
> I have been testing 3.1.2 over the last few days. My overall
> impression is that it resolved several bugs from 3.1.1, but the latest
> version is still prone to crashing under moderate to heavy loads.
> 
> I was running some stress tests on a two server replicated setup today
> with ~150 clients connected with RDMA. The glusterfsd process crashed
> on one server. I waited about 30 minutes to see if the automatic
> fail-over would work, but I continued to receive "Transport: endpoint
> not connected" error messages on all the clients. I saw the following
> error messages in the server log:
> (I removed several hundred error messages from the following snippet)
> [2011-01-21 15:10:13.804308] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x66540x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.supportdir-server)
> [2011-01-21 15:10:13.804314] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x64658x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.supportdir-server)
> [2011-01-21 15:10:13.804342] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 15:10:13.804365] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 15:10:13.804636] I [server.c:428:server_rpc_notify]
> supportdir-server: disconnected connection from 192.168.50.7:1020
> [2011-01-21 15:10:13.804702] I
> [server-helpers.c:670:server_connection_destroy] supportdir-server:
> destroyed connection of
> n7-12719-2011/01/19-17:36:59:497983-supportdir-client-0
> [2011-01-21 15:10:13.805028] I [server.c:428:server_rpc_notify]
> supportdir-server: disconnected connection from 192.168.50.127:1020
> [2011-01-21 15:10:13.805071] I
> [server-helpers.c:670:server_connection_destroy] supportdir-server:
> destroyed connection of
> n127-12567-2011/01/19-17:43:17:468018-supportdir-client-0
> 
> pending frames:
> 
> patchset: v3.1.1-64-gf2a067c
> signal received: 11
> time of crash: 2011-01-21 15:10:13
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.1.2
> /lib64/libc.so.6(+0x32a60)[0x7fc2a7f64a60]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/xlator/protocol/server.so(server_release+0x54)[0x7fc2a4f05454]
> /usr/local/glusterfs/3.1.2/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x26f)[0x7fc2a88d25ef]
> /usr/local/glusterfs/3.1.2/lib/libgfrpc.so.0(rpcsvc_notify+0x123)[0x7fc2a88d2c23]
> /usr/local/glusterfs/3.1.2/lib/libgfrpc.so.0(rpc_transport_notify+0x2d)[0x7fc2a88d6a9d]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/rpc-transport/rdma.so(rdma_pollin_notify+0xd1)[0x7fc2a4ae68b1]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/rpc-transport/rdma.so(rdma_process_recv+0x14b)[0x7fc2a4ae6e8b]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/rpc-transport/rdma.so(+0xb226)[0x7fc2a4ae7226]
> /lib64/libpthread.so.0(+0x6a4f)[0x7fc2a8298a4f]
> /lib64/libc.so.6(clone+0x6d)[0x7fc2a800282d]
> 
> I think the crash is related to this bug:
> http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2197
> 
> I ran some smaller tests on a single server setup. The were ~50
> clients connected via RDMA. While the jobs were running, several of
> them crashed with "File descriptor in bad state" or "Stale File
> Descriptor" errors. Here are the error messages from the server log:
> [2011-01-21 10:15:52.442908] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x16660x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.443012] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x20251x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.442949] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x77360x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.443351] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x26495832x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 40) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.445247] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x25199x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.445291] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x60907x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.447572] I [server.c:428:server_rpc_notify]
> maindir-server: disconnected connection from 192.168.50.116:1018
> [2011-01-21 10:15:52.455116] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455227] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455325] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455436] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455896] I
> [server-helpers.c:670:server_connection_destroy] maindir-server:
> destroyed connection of
> n116-14977-2011/01/20-12:43:18:128066-maindir-client-0
> [2011-01-21 10:15:52.455610] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455659] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455564] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.458581] I [server.c:428:server_rpc_notify]
> maindir-server: disconnected connection from 192.168.50.19:1018
> [2011-01-21 10:15:52.458677] I
> [server-helpers.c:670:server_connection_destroy] maindir-server:
> destroyed connection of
> n19-15053-2011/01/20-12:38:13:243408-maindir-client-0
> (I removed dozens of similar error message)
> 
> The glusterfsd process did not crash in that instance.
> 
> Jeremy Stout
> 
> On Fri, Jan 21, 2011 at 6:49 AM, David Lloyd
> <[email protected]> wrote:
> > Hello,
> >
> > Haven't heard much feedback about installing glusterfs 3.1.2.
> >
> > Should I infer that it's all gone extremely very smoothly for
> > everyone, or
> > is everyone being as cowardly as me and waiting for others to do it
> > first?
> >
> > Cheers
> > David
> >
> > --
> > David Lloyd
> > V Consultants
> > www.v-consultants.co.uk
> > tel: +44 7983 816501
> > skype: davidlloyd1243
> >
> > _______________________________________________
> > Gluster-users mailing list
> > [email protected]
> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to