But one more question on this: why is it allowed to remove an image from the group if there are existing snapshots? Shouldn't this be prevented to keep the group consistency?

And just for my understanding: how are the group snapshots technically created? Is that one snapshot for all images or is it an individual snapshot per image?

Zitat von Eugen Block <[email protected]>:

I understand, you're right about the group consistency of course. I just thought if you can remove an image from the group, it would also remove the image's snapshot(s) from the list of snapshots as well. My scenario is: initially I thought it would make sense to have those servers in a group because if I wanted to rollback, it would make sense to do it for all. But then I thought a bit more about it and decided that one of the images actually doesn't make sense to be in that group. Re-adding it will cause more problems in case of rollback... I need to think about this...

Thanks a lot for taking the time, I really appreciate it!

Zitat von Ilya Dryomov <[email protected]>:

On Wed, Jun 3, 2026 at 10:27 AM Eugen Block <[email protected]> wrote:

Hi,

that is correct, log_to_stderr is false in our cluster. And with
--log-to-stderr true the result is a you expected:

rbd: rollback group to snapshot failed: 2026-06-03T08:06:52.930+0000
7f53896ae0c0 -1 librbd::api::Group: snap_rollback: group snapshot
membership does not match group membership

But what's the conclusion here? So it's not allowed to rollback if
memberships don't match. How would I correct the membership?

Re-add the image back to the group if the image is still around.

Because I
wouldn't want to delete all snapshots from before I removed one image
from the group. Is there any workaround?

I'm not sure I see what needs to be worked around here.  The group is
supposed to be a logical collection of images where some level of
consistency between images is required, not a random "bag".  This
suggests that while images can come and go (i.e. be added and removed
from the group), the group can't always be meaningfully rolled back.
For example, if a group snapshot captured images A, B and C but image
C had since been removed from the group and potentially reformatted,
repurposed for something else or removed altogether, the group's state
exactly as of that snapshot just can't be restored.

Thanks,

               Ilya


Thanks,
Eugen

Zitat von Ilya Dryomov <[email protected]>:

On Fri, May 29, 2026 at 6:41 PM Eugen Block <[email protected]> wrote:

The commands were:

controller02:~# rbd --id user group create images/test-servers

controller02:~# for i in 0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd
72f5816c-c1db-44de-b0a2-19d661faa963
47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc; do rbd --id user group image add
images/test-servers images/${i}_disk; done

controller02:~# rbd --id user group image ls images/test-servers
images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk
images/47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc_disk
images/72f5816c-c1db-44de-b0a2-19d661faa963_disk

controller02:~# rbd --id user group snap create images/test-servers@snap1

controller02:~# rbd --id user group snap ls images/test-servers
NAME   STATUS
snap1      ok


# rollback works for all images
controller02:~# rbd --id user group snap rollback images/test-servers@snap1
Rolling back to group snapshot: 100% complete...done.

# removing one image from the group
controller02:~# rbd --id user group image rm images/test-servers
images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk

# rollback fails
controller02:~# rbd --id user group snap rollback images/test-servers@snap1
Rolling back to group snapshot: 0% complete...failed.
rbd: rollback group to snapshot failed: (22) Invalid argument

I'll add the debug output later, will need to sanitze it first. But I
don't see anything obvious in there.

Hi Eugen,

Based on the above, it's https://tracker.ceph.com/issues/66300 and is
therefore the intended behavior.  The only fly in the ointment is that
you aren't seeing the associated "group snapshot membership does not
match group membership" error message.

You not seeing it is consistent with the attached debug output where
only very early messenger traffic is present and nothing beyond that.
It suggests some non-conventional settings in the cluster-wide config
such as log_to_stderr being set to false or similar.

Can you try appending --log-to-stderr true to "rbd group snap rollback"
command?

Thanks,

                Ilya


Zitat von Ilya Dryomov <[email protected]>:

> On Fri, May 29, 2026 at 4:05 PM Eugen Block <[email protected]> wrote:
>>
>> Hi,
>>
>> thanks for your quick reply. No I didn't see any additional output
>> than the one I shared (invalid argument). I could add debug log level
>> if necessary.
>
> That error message should have been displayed no matter the log level,
> so something other than https://tracker.ceph.com/issues/66300 might be
> involved.
>
> What exactly do you mean by "I removed an image from the group
> snapshot"?  Which commands were run there and in what order?
>
>> But one more detail, I also tried the rollback directly within the
>> cephadm shell (so version 19.2.3) with the same result:
>>
>> ceph03:~ # cephadm shell
>> ...
>> [ceph: root@ceph03 /]# rbd group snap rollback
>> images/test-servers@20260430_start
>> Rolling back to group snapshot: 0% complete...failed.
>> rbd: rollback group to snapshot failed: (22) Invalid argument
>>
>> [ceph: root@ceph03 /]# ceph -v
>> ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62)
>> squid (stable)
>
> Can you try appending --debug-ms 1 --debug-rbd 20 to the command
> (let's stick to this cephadm shell) and attach the output?
>
> Thanks,
>
>                 Ilya
>
>>
>> Thanks!
>> Eugen
>>
>> Zitat von Ilya Dryomov <[email protected]>:
>>
>> > On Fri, May 29, 2026 at 2:33 PM Eugen Block via ceph-users
>> > <[email protected]> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I wanted to rollback a group snapshot on Ubuntu 24.04 (rbd client
>> >> version 19.2.1), the Ceph cluster version is 19.2.3. The client fails
>> >> with "invalid argument":
>> >>
>> >> controller02:~# rbd --id <user> group snap rollback --pool images
>> >> --group test-servers --snap 20260430_start
>> >> Rolling back to group snapshot: 0% complete...failed.
>> >> rbd: rollback group to snapshot failed: (22) Invalid argument
>> >>
>> >> controller02:~# ceph -v
>> >> ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28)
>> >> squid (stable)
>> >>
>> >> But running the same command (just as admin not as <user>) on a Ceph
>> >> node works:
>> >>
>> >> ceph03:~ # rbd group snap rollback --pool images --group test-servers
>> >> --snap 20260430_start
>> >> Rolling back to group snapshot: 100% complete...done.
>> >>
>> >> ceph03:~ # ceph -v
>> >> ceph version 16.2.13-66-g54799ee0666
>> >> (54799ee06669271880ee5fc715f99202002aa371) pacific (stable)
>> >>
>> >>
>> >> What seems to be the issue here is that I removed an image from the
>> >> group snapshot. I wonder if it could be this bug [0] which is supposed
>> >> to be fixed in 19.2.0 according to the "Released In" field of the
>> >> Squid backport tracker [1].
>> >>
>> >> This seems a little inconsistent to me, could someone please clarify?
>> >
>> > Hi Eugen,
>> >
>> > Did you see "group snapshot membership does not match group membership"
>> > error message when the rollback command failed with 19.2.1 client?
>> >
>> > Thanks,
>> >
>> >                 Ilya
>>
>>
>>








_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to