[ceph-users] Dead node (watcher) won't timeout on RBD

max Tue, 25 Apr 2023 14:29:46 -0700

Hey all,

I recently had a k8s node failure in my homelab, and even though I powered it 
off (and it's done for, so it won't get back up), it still shows up as watcher 
in rbd status.


```
root@node0:~# rbd status kubernetes/csi-vol-3e7af8ae-ceb6-4c94-8435-2f8dc29b313b
Watchers:
        watcher=10.0.0.103:0/1520114202 client.1697844 cookie=140289402510784
        watcher=10.0.0.103:0/39967552 client.1805496 cookie=140549449430704
root@node0:~# ceph osd blocklist ls
10.0.0.103:0/0 2023-04-15T13:15:39.061379+0200
listed 1 entries
```

Even though the node is down & I have blocked it multiple times for hours, it 
won't disappear. Meaning, ceph-csi-rbd claims the image is mounted already 
(manually binding works fine, and can cleanly unbind as well, but can't unbind 
from a node that doesn't exist anymore).

Is there any possibility to force kick an rbd client / watcher from ceph (e.g. 
switching the mgr / mon) or to see why this is not timing out?

I found some historical mails & issues (related to rook, which I don't use) 
regarding a param `osd_client_watch_timeout` but can't find how that relates to 
the RBD images.

Cheers,
Max.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Dead node (watcher) won't timeout on RBD

Reply via email to