even more info: i only see the Unable to get lock messages on the same node i'm running the gluster volume command on (status, in this instance). and, it always complains about its self. Forexample:
I run: [root@ox60-gstore10 ~]# gluster volume status [root@ox60-gstore10 ~]# (it sits for a few, then just comes back empty). the logs on that system (ox60-gstore10) yeild: ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <== [2013-06-04 12:55:13.447584] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.447637] I [glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local lock [2013-06-04 12:55:13.447868] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.447898] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da, lock held by: 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.447932] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2013-06-04 12:55:13.447945] E [glusterd-op-sm.c:4624:glusterd_op_sm] 0-glusterd: handler returned: -1 [2013-06-04 12:55:13.447971] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846 [2013-06-04 12:55:13.447993] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 757297b4-5648-4e31-88f4-00fc167a43e4 [2013-06-04 12:55:13.448013] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received RJT from uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.448035] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b [2013-06-04 12:55:13.448056] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122 [2013-06-04 12:55:13.448143] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: a327cd38-f98a-4554-ae62-97a21153f4d3 [2013-06-04 12:55:13.448166] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: cdba3b89-e804-4bf1-afb9-d7c231399955 [2013-06-04 12:55:13.448191] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1 [2013-06-04 12:55:13.448231] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a [2013-06-04 12:55:13.448257] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e [2013-06-04 12:55:13.448282] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0 [2013-06-04 12:55:13.448303] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 2225df4c-4510-457c-9958-0b6506ff25e4 [2013-06-04 12:55:13.448322] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca [2013-06-04 12:55:13.448340] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21 [2013-06-04 12:55:13.448358] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20 [2013-06-04 12:55:13.448376] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b So it sees something is holding the lock, Rejects it, If i look up that uuid: [root@ox60-gstore10 ~]# gluster peer status |grep 0edce15e-0de2-4496-a520-58c65dbbc7da --context=3 Number of Peers: 20 Hostname: ox60-gstore10 Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da State: Peer in Cluster (Connected) so it itself i holding the lock it seems. If i do this on another node in the cluster, i se the same (the node I'm checking the status from is holding a lock, gets rejected, and never gets any info back). -- Matthew Nicholson Research Computing Specialist Harvard FAS Research Computing matthew_nichol...@harvard.edu On Tue, Jun 4, 2013 at 12:21 PM, Matthew Nicholson < matthew_nichol...@harvard.edu> wrote: > If i strace a "gluster volume status" it hangs here: > > epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 257, 4294967295) = 1 > getsockopt(5, SOL_SOCKET, SO_ERROR, [150710196258209792], [4]) = 0 > getsockname(5, {sa_family=AF_INET, sin_port=htons(964), > sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 > futex(0x63b7a4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x63b760, 2) = 1 > futex(0x63b760, FUTEX_WAKE_PRIVATE, 1) = 1 > epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLPRI, {u32=5, u64=5}}) = 0 > epoll_wait(3, > > so talking to locahost on 964 > > All nodes do that, but with different ports. > > > > -- > Matthew Nicholson > Research Computing Specialist > Harvard FAS Research Computing > matthew_nichol...@harvard.edu > > > > On Tue, Jun 4, 2013 at 12:19 PM, Matthew Nicholson < > matthew_nichol...@harvard.edu> wrote: > >> No, no duplicate UUIDs: >> >> [root@ox60-gstore01 ~]# gluster peer status |grep -i uuid | uniq -c >> 1 Uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1 >> 1 Uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a >> 1 Uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca >> 1 Uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20 >> 1 Uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21 >> 1 Uuid: 13cfacc1-65a4-4151-91d5-bc7977e01654 >> 1 Uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b >> 1 Uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b >> 1 Uuid: 113562a1-e521-4747-ae75-477614ea28cf >> 1 Uuid: 04c6c37b-743d-4f87-9bdc-3dfe1b573709 >> 1 Uuid: 2225df4c-4510-457c-9958-0b6506ff25e4 >> 1 Uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e >> 1 Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da >> 1 Uuid: a327cd38-f98a-4554-ae62-97a21153f4d3 >> 1 Uuid: a7d3a064-1bb4-4da0-a680-180db8150e4c >> 1 Uuid: 757297b4-5648-4e31-88f4-00fc167a43e4 >> 1 Uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846 >> 1 Uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122 >> 1 Uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0 >> 1 Uuid: cdba3b89-e804-4bf1-afb9-d7c231399955 >> >> glusterd (as well as glusterfs and the nfs server, which seemingly never >> dies if glusterd is shutdown) have all been restarted. Actually, we just >> went so fas as to bounce one replica then another (reboot). >> >> >> >> -- >> Matthew Nicholson >> Research Computing Specialist >> Harvard FAS Research Computing >> matthew_nichol...@harvard.edu >> >> >> >> On Tue, Jun 4, 2013 at 10:30 AM, Vijay Bellur <vbel...@redhat.com> wrote: >> >>> On 06/04/2013 07:57 PM, Matthew Nicholson wrote: >>> >>>> So, we've got a volume that is mostly functioning fine (its up >>>> accessible, etc etc). However, volume operations fail/don't return on >>>> it. >>>> >>>> >>>> what i mean is >>>> >>>> gluster peer status//probe/etc : works >>>> gluster volume info : works >>>> gluster volume status/remove-brick/etc : sit for a long time and return >>>> nothing. >>>> >>>> The only things coming up in logs are: >>>> >>>> [2013-06-04 10:21:36.398072] I [glusterd-utils.c:285:**glusterd_lock] >>>> 0-glusterd: Cluster lock held by 757297b4-5648-4e31-88f4-**00fc167a43e4 >>>> [2013-06-04 10:21:36.398123] I >>>> [glusterd-handler.c:463:**glusterd_op_txn_begin] 0-management: Acquired >>>> local lock >>>> [2013-06-04 10:21:36.398424] I >>>> [glusterd-handler.c:502:**glusterd_handle_cluster_lock] 0-glusterd: >>>> Received LOCK from uuid: 757297b4-5648-4e31-88f4-**00fc167a43e4 >>>> [2013-06-04 10:21:36.398448] E [glusterd-utils.c:277:**glusterd_lock] >>>> 0-glusterd: Unable to get lock for uuid: >>>> 757297b4-5648-4e31-88f4-**00fc167a43e4, lock held by: >>>> 757297b4-5648-4e31-88f4-**00fc167a43e4 >>>> [2013-06-04 10:21:36.398483] I >>>> [glusterd-handler.c:1322:**glusterd_op_lock_send_resp] 0-glusterd: >>>> Responded, ret: 0 >>>> [2013-06-04 10:21:36.398498] E [glusterd-op-sm.c:4624:**glusterd_op_sm] >>>> 0-glusterd: handler returned: -1 >>>> >>>> If you notice, the UUID holding the lock, and the uuid requesting the >>>> lock, are the same. So it seems like a lock was "forgotten" about? >>>> >>>> any thoughts on clearing this? >>>> >>> >>> Does gluster peer status list the same UUID more than once? >>> >>> If not, restarting the glusterd which is the lock owner should address >>> it. >>> >>> -Vijay >>> >>> >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users