Zitat von Ilya Dryomov <[email protected]>:
> On Wed, Jun 3, 2026 at 12:42 PM Eugen Block via dev <[email protected]> wrote:
>>
>> Now that's unexpected, I thought I had misremembered my previous
>> actions and didn't mention it yet, now I created 3 new VMs, created a
>> new group, took a group snapshot, all good. But a group snap rollback
>> is executed although the rbd images have watchers:
>>
>> rbd --id openstack group snap create images/test-servers@snap1
>>
>> rbd --id openstack group image ls images/test-servers
>> images/3df08789-2be9-4e99-9746-9d2edc8c612a_disk
>> images/7a5d19eb-1034-489f-885a-0074fef59e89_disk
>> images/f79e323f-87a1-4cb7-ad9c-1108ce73efe3_disk
>>
>>
>> rbd status images/3df08789-2be9-4e99-9746-9d2edc8c612a_disk
>> Watchers:
>> watcher=X.X.X.18:0/4236769191 client.<client> cookie=<cookie>
>>
>> rbd --id openstack group snap rollback images/test-servers@snap1
>> Rolling back to group snapshot: 100% complete...done.
>>
>> This shouldn't be possible, I would expect a message like this:
>>
>> Rolling back to snapshot: 0% complete...failed.
>> rbd: rollback failed: (30) Read-only file system
>>
>> Is this a known bug?
>
> No, this behavior is expected. The reason for why "rbd snap rollback"
> command can deny the operation with EROFS when the image is mapped in
> some cases but "rbd group snap rollback" command doesn't do that lies
> in implementation details (specifically the interaction with exclusive
> locks on member images).
>
> When it comes to rollback operation, the mere presence of a watch isn't
> really an indicator of anything. That said, I'd recommend shutting down
> clients before issuing any rollbacks in both standalone image and group
> scenarios. That way all caches would get invalidated and there is no
> chance of confusing someone or something with now-stale data.
>
> Thanks,
>
> Ilya
>
>>
>> Zitat von Eugen Block <[email protected]>:
>>
>> > But one more question on this: why is it allowed to remove an image
>> > from the group if there are existing snapshots? Shouldn't this be
>> > prevented to keep the group consistency?
>> >
>> > And just for my understanding: how are the group snapshots
>> > technically created? Is that one snapshot for all images or is it an
>> > individual snapshot per image?
>> >
>> > Zitat von Eugen Block <[email protected]>:
>> >
>> >> I understand, you're right about the group consistency of course. I
>> >> just thought if you can remove an image from the group, it would
>> >> also remove the image's snapshot(s) from the list of snapshots as
>> >> well. My scenario is: initially I thought it would make sense to
>> >> have those servers in a group because if I wanted to rollback, it
>> >> would make sense to do it for all. But then I thought a bit more
>> >> about it and decided that one of the images actually doesn't make
>> >> sense to be in that group. Re-adding it will cause more problems in
>> >> case of rollback... I need to think about this...
>> >>
>> >> Thanks a lot for taking the time, I really appreciate it!
>> >>
>> >> Zitat von Ilya Dryomov <[email protected]>:
>> >>
>> >>> On Wed, Jun 3, 2026 at 10:27 AM Eugen Block <[email protected]> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> that is correct, log_to_stderr is false in our cluster. And with
>> >>>> --log-to-stderr true the result is a you expected:
>> >>>>
>> >>>> rbd: rollback group to snapshot failed: 2026-06-03T08:06:52.930+0000
>> >>>> 7f53896ae0c0 -1 librbd::api::Group: snap_rollback: group snapshot
>> >>>> membership does not match group membership
>> >>>>
>> >>>> But what's the conclusion here? So it's not allowed to rollback if
>> >>>> memberships don't match. How would I correct the membership?
>> >>>
>> >>> Re-add the image back to the group if the image is still around.
>> >>>
>> >>>> Because I
>> >>>> wouldn't want to delete all snapshots from before I removed
one image
>> >>>> from the group. Is there any workaround?
>> >>>
>> >>> I'm not sure I see what needs to be worked around here. The group is
>> >>> supposed to be a logical collection of images where some level of
>> >>> consistency between images is required, not a random "bag". This
>> >>> suggests that while images can come and go (i.e. be added and removed
>> >>> from the group), the group can't always be meaningfully rolled back.
>> >>> For example, if a group snapshot captured images A, B and C but image
>> >>> C had since been removed from the group and potentially reformatted,
>> >>> repurposed for something else or removed altogether, the
group's state
>> >>> exactly as of that snapshot just can't be restored.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Ilya
>> >>>
>> >>>>
>> >>>> Thanks,
>> >>>> Eugen
>> >>>>
>> >>>> Zitat von Ilya Dryomov <[email protected]>:
>> >>>>
>> >>>>> On Fri, May 29, 2026 at 6:41 PM Eugen Block <[email protected]> wrote:
>> >>>>>>
>> >>>>>> The commands were:
>> >>>>>>
>> >>>>>> controller02:~# rbd --id user group create images/test-servers
>> >>>>>>
>> >>>>>> controller02:~# for i in 0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd
>> >>>>>> 72f5816c-c1db-44de-b0a2-19d661faa963
>> >>>>>> 47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc; do rbd --id user group
>> image add
>> >>>>>> images/test-servers images/${i}_disk; done
>> >>>>>>
>> >>>>>> controller02:~# rbd --id user group image ls images/test-servers
>> >>>>>> images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk
>> >>>>>> images/47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc_disk
>> >>>>>> images/72f5816c-c1db-44de-b0a2-19d661faa963_disk
>> >>>>>>
>> >>>>>> controller02:~# rbd --id user group snap create
>> >>>>>> images/test-servers@snap1
>> >>>>>>
>> >>>>>> controller02:~# rbd --id user group snap ls images/test-servers
>> >>>>>> NAME STATUS
>> >>>>>> snap1 ok
>> >>>>>>
>> >>>>>>
>> >>>>>> # rollback works for all images
>> >>>>>> controller02:~# rbd --id user group snap rollback
>> >>>>>> images/test-servers@snap1
>> >>>>>> Rolling back to group snapshot: 100% complete...done.
>> >>>>>>
>> >>>>>> # removing one image from the group
>> >>>>>> controller02:~# rbd --id user group image rm images/test-servers
>> >>>>>> images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk
>> >>>>>>
>> >>>>>> # rollback fails
>> >>>>>> controller02:~# rbd --id user group snap rollback
>> >>>>>> images/test-servers@snap1
>> >>>>>> Rolling back to group snapshot: 0% complete...failed.
>> >>>>>> rbd: rollback group to snapshot failed: (22) Invalid argument
>> >>>>>>
>> >>>>>> I'll add the debug output later, will need to sanitze it
first. But I
>> >>>>>> don't see anything obvious in there.
>> >>>>>
>> >>>>> Hi Eugen,
>> >>>>>
>> >>>>> Based on the above, it's
https://tracker.ceph.com/issues/66300 and is
>> >>>>> therefore the intended behavior. The only fly in the
ointment is that
>> >>>>> you aren't seeing the associated "group snapshot
membership does not
>> >>>>> match group membership" error message.
>> >>>>>
>> >>>>> You not seeing it is consistent with the attached debug
output where
>> >>>>> only very early messenger traffic is present and nothing
beyond that.
>> >>>>> It suggests some non-conventional settings in the
cluster-wide config
>> >>>>> such as log_to_stderr being set to false or similar.
>> >>>>>
>> >>>>> Can you try appending --log-to-stderr true to "rbd group snap
>> rollback"
>> >>>>> command?
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Ilya
>> >>>>>
>> >>>>>>
>> >>>>>> Zitat von Ilya Dryomov <[email protected]>:
>> >>>>>>
>> >>>>>>> On Fri, May 29, 2026 at 4:05 PM Eugen Block
<[email protected]> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> thanks for your quick reply. No I didn't see any
additional output
>> >>>>>>>> than the one I shared (invalid argument). I could add
>> debug log level
>> >>>>>>>> if necessary.
>> >>>>>>>
>> >>>>>>> That error message should have been displayed no matter the
>> log level,
>> >>>>>>> so something other than
>> https://tracker.ceph.com/issues/66300 might be
>> >>>>>>> involved.
>> >>>>>>>
>> >>>>>>> What exactly do you mean by "I removed an image from the group
>> >>>>>>> snapshot"? Which commands were run there and in what order?
>> >>>>>>>
>> >>>>>>>> But one more detail, I also tried the rollback directly
within the
>> >>>>>>>> cephadm shell (so version 19.2.3) with the same result:
>> >>>>>>>>
>> >>>>>>>> ceph03:~ # cephadm shell
>> >>>>>>>> ...
>> >>>>>>>> [ceph: root@ceph03 /]# rbd group snap rollback
>> >>>>>>>> images/test-servers@20260430_start
>> >>>>>>>> Rolling back to group snapshot: 0% complete...failed.
>> >>>>>>>> rbd: rollback group to snapshot failed: (22) Invalid argument
>> >>>>>>>>
>> >>>>>>>> [ceph: root@ceph03 /]# ceph -v
>> >>>>>>>> ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62)
>> >>>>>>>> squid (stable)
>> >>>>>>>
>> >>>>>>> Can you try appending --debug-ms 1 --debug-rbd 20 to the command
>> >>>>>>> (let's stick to this cephadm shell) and attach the output?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>>
>> >>>>>>> Ilya
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Thanks!
>> >>>>>>>> Eugen
>> >>>>>>>>
>> >>>>>>>> Zitat von Ilya Dryomov <[email protected]>:
>> >>>>>>>>
>> >>>>>>>> > On Fri, May 29, 2026 at 2:33 PM Eugen Block via ceph-users
>> >>>>>>>> > <[email protected]> wrote:
>> >>>>>>>> >>
>> >>>>>>>> >> Hi,
>> >>>>>>>> >>
>> >>>>>>>> >> I wanted to rollback a group snapshot on Ubuntu 24.04
>> (rbd client
>> >>>>>>>> >> version 19.2.1), the Ceph cluster version is 19.2.3. The
>> >>>>>>>> client fails
>> >>>>>>>> >> with "invalid argument":
>> >>>>>>>> >>
>> >>>>>>>> >> controller02:~# rbd --id <user> group snap rollback
>> --pool images
>> >>>>>>>> >> --group test-servers --snap 20260430_start
>> >>>>>>>> >> Rolling back to group snapshot: 0% complete...failed.
>> >>>>>>>> >> rbd: rollback group to snapshot failed: (22) Invalid argument
>> >>>>>>>> >>
>> >>>>>>>> >> controller02:~# ceph -v
>> >>>>>>>> >> ceph version 19.2.1
(9efac4a81335940925dd17dbf407bfd6d3860d28)
>> >>>>>>>> >> squid (stable)
>> >>>>>>>> >>
>> >>>>>>>> >> But running the same command (just as admin not as <user>)
>> >>>>>>>> on a Ceph
>> >>>>>>>> >> node works:
>> >>>>>>>> >>
>> >>>>>>>> >> ceph03:~ # rbd group snap rollback --pool images --group
>> >>>>>>>> test-servers
>> >>>>>>>> >> --snap 20260430_start
>> >>>>>>>> >> Rolling back to group snapshot: 100% complete...done.
>> >>>>>>>> >>
>> >>>>>>>> >> ceph03:~ # ceph -v
>> >>>>>>>> >> ceph version 16.2.13-66-g54799ee0666
>> >>>>>>>> >> (54799ee06669271880ee5fc715f99202002aa371) pacific (stable)
>> >>>>>>>> >>
>> >>>>>>>> >>
>> >>>>>>>> >> What seems to be the issue here is that I removed an
>> image from the
>> >>>>>>>> >> group snapshot. I wonder if it could be this bug [0] which
>> >>>>>>>> is supposed
>> >>>>>>>> >> to be fixed in 19.2.0 according to the "Released In"
>> field of the
>> >>>>>>>> >> Squid backport tracker [1].
>> >>>>>>>> >>
>> >>>>>>>> >> This seems a little inconsistent to me, could someone
>> >>>>>>>> please clarify?
>> >>>>>>>> >
>> >>>>>>>> > Hi Eugen,
>> >>>>>>>> >
>> >>>>>>>> > Did you see "group snapshot membership does not match group
>> >>>>>>>> membership"
>> >>>>>>>> > error message when the rollback command failed with
>> 19.2.1 client?
>> >>>>>>>> >
>> >>>>>>>> > Thanks,
>> >>>>>>>> >
>> >>>>>>>> > Ilya
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>>
>>
>>