Re: [Gluster-users] Gluster failure due to "0-management: Lock not released for "

Victor Nomura Tue, 04 Jul 2017 11:27:12 -0700

Specifically, I must stop glusterfs-server service on the other nodes in order 
to perform any gluster commands on any node.

From: Victor Nomura [mailto:vic...@mezine.com] 
Sent: July-04-17 9:41 AM
To: 'Atin Mukherjee'
Cc: 'gluster-users'
Subject: RE: [Gluster-users] Gluster failure due to "0-management: Lock not 
released for <volumename>"

The nodes have all been rebooted numerous times with no difference in outcome.  
The nodes are all connected to the same switch and I also replaced it to see if 
made any difference.

There is no issues with connectivity network wise and no firewall in place 
between the nodes.  

I can’t do a gluster volume status without it timing out the moment the other 2 
nodes are connected to the switch. Which is odd.   With one node turned on and 
the others off, I can perform some volume commands but the moment any one of 
the others are connected,  a lot of commands just timeout. There’s no IP 
address conflict or anything of that nature either.

Seems nothing can resolve the locks.  Is there a manual  way to resolve the 
locks?

Regards,

Victor

From: Atin Mukherjee [mailto:amukh...@redhat.com] 
Sent: June-30-17 3:40 AM
To: Victor Nomura
Cc: gluster-users
Subject: Re: [Gluster-users] Gluster failure due to "0-management: Lock not 
released for <volumename>"

On Thu, 29 Jun 2017 at 22:51, Victor Nomura <vic...@mezine.com> wrote:

Thanks for the reply.  What would be the best course of action?  The data on 
the volume isn’t important right now but I’m worried when our setup goes to 
production we don’t have the same situation and really need to recover our 
Gluster setup.

I’m assuming that to redo is to delete everything in the /var/lib/glusterd 
directory on each of the nodes and recreate the volume again. Essentially 
starting over.  If I leave the mount points the same and keep the data&setup 
intact will the files still be there and accessible after? (I don’t delete the 
data on the bricks)

I dont think there is anything wrong at gluster stack. If you cross check the 
n/w layer and make sure its up all the time then restarting glusterd on all the 
nodes should resolve the stale locks.

Regards,

Victor Nomura

From: Atin Mukherjee [mailto:amukh...@redhat.com] 
Sent: June-27-17 12:29 AM

To: Victor Nomura
Cc: gluster-users

Subject: Re: [Gluster-users] Gluster failure due to "0-management: Lock not 
released for <volumename>"

I had looked at the logs shared by Victor privately and it seems to be there is 
a N/W glitch in the cluster which is causing the glusterd to lose its 
connection with other peers and as a side effect to this, lot of rpc requests 
are getting bailed out resulting glusterd to end up into a stale lock and hence 
you see that some of the commands failed with "another transaction is in 
progress or locking failed."

Some examples of the symptom highlighted:

[2017-06-21 23:02:03.826858] E [rpc-clnt.c:200:call_bail] 0-management: bailing 
out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21 
22:52:02.719068. timeout = 600 for 192.168.150.53:24007
[2017-06-21 23:02:03.826888] E [rpc-clnt.c:200:call_bail] 0-management: bailing 
out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21 
22:52:02.716782. timeout = 600 for 192.168.150.52:24007
[2017-06-21 23:02:53.836936] E [rpc-clnt.c:200:call_bail] 0-management: bailing 
out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent = 2017-06-21 
22:52:47.909169. timeout = 600 for 192.168.150.53:24007
[2017-06-21 23:02:53.836991] E [MSGID: 106116] 
[glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking failed on 
gfsnode3. Please check log file for details.
[2017-06-21 23:02:53.837016] E [rpc-clnt.c:200:call_bail] 0-management: bailing 
out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent = 2017-06-21 
22:52:47.909175. timeout = 600 for 192.168.150.52:24007

I'd like you to request to first look at the N/W layer and rectify the problems.

On Thu, Jun 22, 2017 at 9:30 PM, Atin Mukherjee <amukh...@redhat.com> wrote:

Could you attach glusterd.log and cmd_history.log files from all the nodes?

On Wed, Jun 21, 2017 at 11:40 PM, Victor Nomura <vic...@mezine.com> wrote:

Hi All,

I’m fairly new to Gluster (3.10.3) and got it going for a couple of months now 
but suddenly after a power failure in our building it all came crashing down.  
No client is able to connect after powering back the 3 nodes I have setup.

Looking at the logs, it looks like there’s some sort of “Lock” placed on the 
volume which prevents all the clients from connecting to the Gluster endpoint.

I can’t even do a #gluster volume status all command IF more than 1 node is 
powered up.  I have to shutdown node2-3 and then I am able to issue the command 
on node1 to see volume status.  When all nodes are powered up and I check the 
peer status, it says that all peers are connected.  Trying to connect to the 
Gluster volume from all clients says gluster endpoint is not available and 
times out. There are no network issues and each node can ping each other and 
there are no firewalls or any other device between the nodes and clients.

Please help if you think you know how to fix this.  I have a feeling it’s this 
“lock” that’s not “released” due to the whole setup losing power all of a 
sudden.  I’ve tried restarting all the nodes, restarting glusterfs-server etc. 
I’m out of ideas.

Thanks in advance!

Victor

Volume Name: teravolume

Type: Distributed-Replicate

Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5

Status: Started

Snapshot Count: 0

Number of Bricks: 3 x 2 = 6

Transport-type: tcp

Bricks:

Brick1: gfsnode1:/media/brick1

Brick2: gfsnode2:/media/brick1

Brick3: gfsnode3:/media/brick1

Brick4: gfsnode1:/media/brick2

Brick5: gfsnode2:/media/brick2

Brick6: gfsnode3:/media/brick2

Options Reconfigured:

nfs.disable: on

[2017-06-21 16:02:52.376709] W [MSGID: 106118] 
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not 
released for teravolume

[2017-06-21 16:03:03.429032] I [MSGID: 106163] 
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 31000

[2017-06-21 16:13:13.326478] E [rpc-clnt.c:200:call_bail] 0-management: bailing 
out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21 
16:03:03.202284. timeout = 600 for 192.168.150.52:$

[2017-06-21 16:13:13.326519] E [rpc-clnt.c:200:call_bail] 0-management: bailing 
out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21 
16:03:03.204555. timeout = 600 for 192.168.150.53:$

[2017-06-21 16:18:34.456522] I [MSGID: 106004] 
[glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer 
<gfsnode2> (<e1e1caa5-9842-40d8-8492-a82b079879a3>), in state <Peer in Cluste$

[2017-06-21 16:18:34.456619] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] 
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
 [0x7fee6bc22879] -->/usr/lib/x86_64-l$

[2017-06-21 16:18:34.456638] W [MSGID: 106118] 
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not 
released for teravolume

[2017-06-21 16:18:34.456661] I [MSGID: 106004] 
[glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer 
<gfsnode3> (<59b9effa-2b88-4764-9130-4f31c14c362e>), in state <Peer in Cluste$

[2017-06-21 16:18:34.456692] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] 
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
 [0x7fee6bc22879] -->/usr/lib/x86_64-l$

[2017-06-21 16:18:43.323944] I [MSGID: 106163] 
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 31000

[2017-06-21 16:18:34.456699] W [MSGID: 106118] 
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not 
released for teravolume

[2017-06-21 16:18:45.628552] I [MSGID: 106163] 
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 31000

[2017-06-21 16:23:40.607173] I [MSGID: 106499] 
[glusterd-handler.c:4363:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume teravolume

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

-- 

- Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster failure due to "0-management: Lock not released for "

Reply via email to