10.1.0.100 is the IP of the replica server that is down. However this log is from the replica server that is up, there's only 2 servers and they are both replicas for the volume. It shows up when attempting to mount the volume from a client, it seems the server that's up is trying to contact the server that's down and things are failing?
I also noticed in the glusterd log the following continuous errors when the other node is down, is this normal? [2015-07-02 06:16:18.028223] W [glusterd-locks.c:653:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f1d94a9bd59] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x47a)[0x7f1d8fa30efa] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a2)[0x7f1d8f9abda2] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f1d8f9a3700] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a8)[0x7f1d9486c458] ))))) 0-management: Lock for vol test not held On Wed, Jul 1, 2015 at 5:03 PM, Vijay Bellur <[email protected]> wrote: > On Tuesday 30 June 2015 10:56 PM, Gabriel Kuri wrote: > >> I am able to reproduce a problem, which I think may be a bug, where if 1 >> of the 2 replica servers for a volume is down, clients are unable to >> mount the volume. I notice that if the replica that is down is on the >> same subnet as the client, the client fails to mount the volume, but if >> the replica that is down is on a different subnet, the client fails over >> properly and mounts the volume. >> >> Here are the errors from the server that is still up when the client is >> unable to mount the volume when the replica on the same subnet as the >> client is down. Ideas? Should I open a bug? >> >> [2015-07-01 05:43:08.428657] W [socket.c:923:__socket_keepalive] >> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 21, Invalid >> argument >> [2015-07-01 05:43:08.428710] E [socket.c:3015:socket_connect] >> 0-management: Failed to set keep-alive: Invalid argument >> [2015-07-01 05:43:08.429260] E [socket.c:3071:socket_connect] >> 0-management: connection attempt on 10.1.0.100:24007 >> <http://10.1.0.100:24007> failed, (Connection refused) >> > > > This points to the client not being able to talk to glusterd on > 10.1.0.100. Is glusterd running on this node and if yes, can port 24007 be > reached from the client machine? > > Regards, > Vijay >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
