Thanks, Corey. At the moment I only have two nodes to play with. If I am able to add third, would you recommend using “replicate 3” on the volume, or is that overkill? I understand what you mean with the quorum, but for my purposes, it would be nice to be able to say “you may not be able to write during a node failure, but you can still read.” —CJ
> On Apr 17, 2015, at 6:18 AM, Corey Kovacs <[email protected]> wrote: > > Typically you need to meet a quorum requirement to run just about any > cluster. By definition, two nodes doesn't make a good cluster. A third node > would let you start with just two since that would allow you to meet quorum. > Can you add a third node to at least test? > > Corey > > On Apr 16, 2015 6:52 PM, "CJ Baar" <[email protected] <mailto:[email protected]>> > wrote: > I appreciate the info. I have tried adjust the ping-timeout setting, and it > has seems to have no effect. The whole system hangs for 45+ seconds, which is > about what it takes the second node to reboot, no matter what the value of > ping-timeout is. The output of the mnt-log is below. It shows the adjust > value I am currently testing (30s), but the system still hangs for longer > than that. > > Also, I have realized that the problem is deeper than I originally thought. > It’s not just the mount that is hanging when a node reboots… it appears to be > the entire system. I cannot use my SSH connection, no matter where I am in > the system, and services such as httpd become unresponsive. I can ping the > “surviving” system, but other than that it appears pretty unusable. This is > a major drawback to using gluster. I can’t afford to lost two entire systems > if one dies. > > [2015-04-16 22:59:21.281365] C > [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-common-client-0: server > 172.31.64.200:49152 <http://172.31.64.200:49152/> has not responded in the > last 30 seconds, disconnecting. > [2015-04-16 22:59:21.281560] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e] (--> > /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951] > (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] ))))) > 0-common-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) > called at 2015-04-16 22:58:45.830962 (xid=0x6d) > [2015-04-16 22:59:21.281588] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] > 0-common-client-0: remote operation failed: Transport endpoint is not > connected. Path: / (00000000-0000-0000-0000-000000000001) > [2015-04-16 22:59:21.281788] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e] (--> > /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951] > (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] ))))) > 0-common-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at > 2015-04-16 22:58:51.277528 (xid=0x6e) > [2015-04-16 22:59:21.281806] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk] > 0-common-client-0: socket disconnected > [2015-04-16 22:59:21.281816] I [client.c:2215:client_rpc_notify] > 0-common-client-0: disconnected from common-client-0. Client process will > keep trying to connect to glusterd until brick's port is available > [2015-04-16 22:59:21.283637] I [socket.c:3292:socket_submit_request] > 0-common-client-0: not connected (priv->connected = 0) > [2015-04-16 22:59:21.283663] W [rpc-clnt.c:1562:rpc_clnt_submit] > 0-common-client-0: failed to submit rpc-request (XID: 0x6f Program: GlusterFS > 3.3, ProgVers: 330, Proc: 27) to rpc-transport (common-client-0) > [2015-04-16 22:59:21.283674] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] > 0-common-client-0: remote operation failed: Transport endpoint is not > connected. Path: /src (63fc077b-869d-4928-8819-a79cc5c5ffa6) > [2015-04-16 22:59:21.284219] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] > 0-common-client-0: remote operation failed: Transport endpoint is not > connected. Path: (null) (00000000-0000-0000-0000-000000000000) > [2015-04-16 22:59:52.322952] E > [client-handshake.c:1496:client_query_portmap_cbk] 0-common-client-0: failed > to get the port number for [root@cfm-c glusterfs]# > > > —CJ
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
