Hi,
I'm currently investigating a case where Ceph cluster ended up with
inconsistent clone information.
Here's a what I did to quickly reproduce:
* Created new cluster (tested in hammer 0.94.6 and jewel 10.2.3)
* Created two pools: test and rbd
* Created base image in pool test, created snapshot, protected it and created
clone of this snapshot in pool rbd:
# rbd -p test create --size 10 --image-format 2 base
# rbd -p test snap create base@base
# rbd -p test snap protect base@base
# rbd clone test/base@base rbd/destination
* Created new user called "test" with rwx permissions to rbd pool only:
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx
pool=rbd
* Using this newly creted user I removed the cloned image in rbd pool, had
errors but finally removed the image:
# rbd --id test -p rbd rm destination
2016-12-21 11:50:03.758221 7f32b7459700 -1 librbd::image::OpenRequest:
failed to retreive name: (1) Operation not permitted
2016-12-21 11:50:03.758288 7f32b6c58700 -1
librbd::image::RefreshParentRequest: failed to open parent image: (1) Operation
not permitted
2016-12-21 11:50:03.758312 7f32b6c58700 -1
librbd::image::RefreshRequest: failed to refresh parent image: (1) Operation
not permitted
2016-12-21 11:50:03.758333 7f32b6c58700 -1 librbd::image::OpenRequest:
failed to refresh image: (1) Operation not permitted
2016-12-21 11:50:03.759366 7f32b6c58700 -1 librbd::ImageState: failed
to open image: (1) Operation not permitted
Removing image: 100% complete...done.
At this point there's no cloned image but the original snapshot still has
reference to it:
# rbd -p test snap unprotect base@base
2016-12-21 11:53:47.359060 7fee037fe700 -1 librbd::SnapshotUnprotectRequest:
cannot unprotect: at least 1 child(ren) [29b0238e1f29] in pool 'rbd'
2016-12-21 11:53:47.359678 7fee037fe700 -1 librbd::SnapshotUnprotectRequest:
encountered error: (16) Device or resource busy
2016-12-21 11:53:47.359691 7fee037fe700 -1 librbd::SnapshotUnprotectRequest:
0x7fee39ae9340 should_complete_error: ret_val=-16
2016-12-21 11:53:47.360627 7fee037fe700 -1 librbd::SnapshotUnprotectRequest:
0x7fee39ae9340 should_complete_error: ret_val=-16
rbd: unprotecting snap failed: (16) Device or resource busy
# rbd -p test children base@base
rbd: listing children failed: (2) No such file or directory2016-12-21
11:53:08.716987 7ff2b2eaad80 -1 librbd: Error looking up name for image
id 29b0238e1f29 in pool rbd
Any ideas on how this could be fixed?
Thanks,
Bartek
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com