Re: [Gluster-users] Stale locks on shards

Samuli Heinonen Sun, 21 Jan 2018 11:04:09 -0800

Hi again,

here is more information regarding issue described earlier

It looks like self healing is stuck. According to "heal statistics"crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It'saround Sun Jan 21 20:30 when writing this). However glustershd.log saysthat last heal was completed at "2018-01-20 11:00:13.090697" (which is13:00 UTC+2). Also "heal info" has been running now for over 16 hourswithout any information. In statedump I can see that storage nodes havelocks on files and some of those are blocked. Ie. Here again it saysthat ovirt8z2 is having active lock even ovirt8z2 crashed after the lockwas granted.:


[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal

inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,granted at 2018-01-20 10:59:52

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0

inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =3420, owner=d8b9372c397f0000, client=0x7f8858410be0,connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,granted at 2018-01-20 08:57:23inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,blocked at 2018-01-20 10:59:52

I'd also like to add that volume had arbiter brick before crashhappened. We decided to remove it because we thought that it was causingissues. However now I think that this was unnecessary. After the crasharbiter logs had lots of messages like this:[2018-01-20 10:19:36.515717] I [MSGID: 115072][server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server:37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)[Operation not permitted]

Is there anyways to force self heal to stop? Any help would be very muchappreciated :)


Best regards,
Samuli Heinonen

Samuli Heinonen <mailto:[email protected]>
20 January 2018 at 21.57
Hi all!

One hypervisor on our virtualization environment crashed and now someof the VM images cannot be accessed. After investigation we found outthat there was lots of images that still had active lock on crashedhypervisor. We were able to remove locks from "regular files", but itdoesn't seem possible to remove locks from shards.


We are running GlusterFS 3.8.15 on all nodes.

Here is part of statedump that shows shard having active lock oncrashed node:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0

inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-idovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,granted at 2018-01-20 08:57:24


If we try to run clear-locks we get following error message:

# gluster volume clear-locks zone2-ssd1-vmstor1/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode

Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted

Gluster vol info if needed:
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
Options Reconfigured:
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO

Any recommendations how to advance from here?

Best regards,
Samuli Heinonen

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Stale locks on shards

Reply via email to