Hi,

We're hitting an odd issue on our ceph cluster:

- We have machine1 mapping an exclusive-lock RBD.
- Machine2 wants to take a snapshot of the RBD, but fails to take the lock.

Stracing the rbd snap process on machine2 shows it looping on sending
"lockget" commands, without ever moving forward.

In rbd status, we see that machine1 is a watcher on the image, which is
expected. What is not expected is that the rbd snap process can't get the
lock.

This commit deployed in 10.2.10, which we are using, sounds related:
https://github.com/ceph/ceph/commit/475dda114a7e25b43dc9066b9808a64fc0c6dc89

But there isn't the equivalent in ceph-client's code, which we would expect
too. That said, I don't have a full understanding, so I might be off-base
there.

Am I wrong in expecting the equivalent in ceph-client's code? (aka Linux
kernel) Am I completely off-base as to what is wrong there? Can I provide
any additional information to help debugging?

Regards,
Florian
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to