On Thu, 29 Jun 2017 at 22:51, Victor Nomura <[email protected]> wrote:
> Thanks for the reply. What would be the best course of action? The data > on the volume isn’t important right now but I’m worried when our setup goes > to production we don’t have the same situation and really need to recover > our Gluster setup. > > > > I’m assuming that to redo is to delete everything in the /var/lib/glusterd > directory on each of the nodes and recreate the volume again. Essentially > starting over. If I leave the mount points the same and keep the > data&setup intact will the files still be there and accessible after? (I > don’t delete the data on the bricks) > I dont think there is anything wrong at gluster stack. If you cross check the n/w layer and make sure its up all the time then restarting glusterd on all the nodes should resolve the stale locks. > > Regards, > > > > Victor Nomura > > > > *From:* Atin Mukherjee [mailto:[email protected]] > *Sent:* June-27-17 12:29 AM > > > *To:* Victor Nomura > *Cc:* gluster-users > > *Subject:* Re: [Gluster-users] Gluster failure due to "0-management: Lock > not released for <volumename>" > > > > I had looked at the logs shared by Victor privately and it seems to be > there is a N/W glitch in the cluster which is causing the glusterd to lose > its connection with other peers and as a side effect to this, lot of rpc > requests are getting bailed out resulting glusterd to end up into a stale > lock and hence you see that some of the commands failed with "another > transaction is in progress or locking failed." > > Some examples of the symptom highlighted: > > [2017-06-21 23:02:03.826858] E [rpc-clnt.c:200:call_bail] 0-management: > bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21 > 22:52:02.719068. timeout = 600 for 192.168.150.53:24007 > [2017-06-21 23:02:03.826888] E [rpc-clnt.c:200:call_bail] 0-management: > bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21 > 22:52:02.716782. timeout = 600 for 192.168.150.52:24007 > [2017-06-21 23:02:53.836936] E [rpc-clnt.c:200:call_bail] 0-management: > bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent = > 2017-06-21 22:52:47.909169. timeout = 600 for 192.168.150.53:24007 > [2017-06-21 23:02:53.836991] E [MSGID: 106116] > [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking > failed on gfsnode3. Please check log file for details. > [2017-06-21 23:02:53.837016] E [rpc-clnt.c:200:call_bail] 0-management: > bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent = > 2017-06-21 22:52:47.909175. timeout = 600 for 192.168.150.52:24007 > > I'd like you to request to first look at the N/W layer and rectify the > problems. > > > > > > > On Thu, Jun 22, 2017 at 9:30 PM, Atin Mukherjee <[email protected]> > wrote: > > Could you attach glusterd.log and cmd_history.log files from all the nodes? > > > > On Wed, Jun 21, 2017 at 11:40 PM, Victor Nomura <[email protected]> wrote: > > Hi All, > > > > I’m fairly new to Gluster (3.10.3) and got it going for a couple of months > now but suddenly after a power failure in our building it all came crashing > down. No client is able to connect after powering back the 3 nodes I have > setup. > > > > Looking at the logs, it looks like there’s some sort of “Lock” placed on > the volume which prevents all the clients from connecting to the Gluster > endpoint. > > > > I can’t even do a #gluster volume status all command IF more than 1 node > is powered up. I have to shutdown node2-3 and then I am able to issue the > command on node1 to see volume status. When all nodes are powered up and > I check the peer status, it says that all peers are connected. Trying to > connect to the Gluster volume from all clients says gluster endpoint is not > available and times out. There are no network issues and each node can > ping each other and there are no firewalls or any other device between the > nodes and clients. > > > > Please help if you think you know how to fix this. I have a feeling it’s > this “lock” that’s not “released” due to the whole setup losing power all > of a sudden. I’ve tried restarting all the nodes, restarting > glusterfs-server etc. I’m out of ideas. > > > > Thanks in advance! > > > > Victor > > > > Volume Name: teravolume > > Type: Distributed-Replicate > > Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 3 x 2 = 6 > > Transport-type: tcp > > Bricks: > > Brick1: gfsnode1:/media/brick1 > > Brick2: gfsnode2:/media/brick1 > > Brick3: gfsnode3:/media/brick1 > > Brick4: gfsnode1:/media/brick2 > > Brick5: gfsnode2:/media/brick2 > > Brick6: gfsnode3:/media/brick2 > > Options Reconfigured: > > nfs.disable: on > > > > > > [2017-06-21 16:02:52.376709] W [MSGID: 106118] > [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not > released for teravolume > > [2017-06-21 16:03:03.429032] I [MSGID: 106163] > [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 31000 > > [2017-06-21 16:13:13.326478] E [rpc-clnt.c:200:call_bail] 0-management: > bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21 > 16:03:03.202284. timeout = 600 for 192.168.150.52:$ > > [2017-06-21 16:13:13.326519] E [rpc-clnt.c:200:call_bail] 0-management: > bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21 > 16:03:03.204555. timeout = 600 for 192.168.150.53:$ > > [2017-06-21 16:18:34.456522] I [MSGID: 106004] > [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer > <gfsnode2> (<e1e1caa5-9842-40d8-8492-a82b079879a3>), in state <Peer in > Cluste$ > > [2017-06-21 16:18:34.456619] W > [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879) > [0x7fee6bc22879] -->/usr/lib/x86_64-l$ > > [2017-06-21 16:18:34.456638] W [MSGID: 106118] > [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not > released for teravolume > > [2017-06-21 16:18:34.456661] I [MSGID: 106004] > [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer > <gfsnode3> (<59b9effa-2b88-4764-9130-4f31c14c362e>), in state <Peer in > Cluste$ > > [2017-06-21 16:18:34.456692] W > [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879) > [0x7fee6bc22879] -->/usr/lib/x86_64-l$ > > [2017-06-21 16:18:43.323944] I [MSGID: 106163] > [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 31000 > > [2017-06-21 16:18:34.456699] W [MSGID: 106118] > [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not > released for teravolume > > [2017-06-21 16:18:45.628552] I [MSGID: 106163] > [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 31000 > > [2017-06-21 16:23:40.607173] I [MSGID: 106499] > [glusterd-handler.c:4363:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume teravolume > > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > -- - Atin (atinm)
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
