Here's the debug output:

ceph03:~ # cephadm shell
Inferring fsid 655cb05a-435a-41ba-83d9-8549f7c36167
Using recent ceph image registry/ceph-upstream@sha256:7c69e59beaeea61ca714e71cb84ff6d5e533db7f1fd84143dd9ba6649a5fd2ec [ceph: root@ceph03 /]# rbd group snap rollback images/uke-servers@20260505_kolla-b4-deploy --debug-ms 1 --debug-rbd 20
2026-05-29T16:40:04.546+0000 7f20c04670c0  1  Processor -- start
2026-05-29T16:40:04.546+0000 7f20c04670c0  1 --  start start
2026-05-29T16:40:04.546+0000 7f20c04670c0 1 --2- >> [v2:X.X.X.23:3300/0,v1:X.X.X.23:6789/0] conn(0x55dc97d4ca50 0x55dc97d4ce30 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect 2026-05-29T16:40:04.546+0000 7f20c04670c0 1 --2- >> [v2:X.X.X.22:3300/0,v1:X.X.X.22:6789/0] conn(0x55dc97d4d370 0x55dc97d55ac0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect 2026-05-29T16:40:04.546+0000 7f20c04670c0 1 --2- >> [v2:X.X.X.24:3300/0,v1:X.X.X.24:6789/0] conn(0x55dc97d56000 0x55dc97d583f0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect 2026-05-29T16:40:04.546+0000 7f20c01b6640 1 --2- >> [v2:X.X.X.24:3300/0,v1:X.X.X.24:6789/0] conn(0x55dc97d56000 0x55dc97d583f0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 --2- >> [v2:X.X.X.23:3300/0,v1:X.X.X.23:6789/0] conn(0x55dc97d4ca50 0x55dc97d4ce30 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 --2- >> [v2:X.X.X.23:3300/0,v1:X.X.X.23:6789/0] conn(0x55dc97d4ca50 0x55dc97d4ce30 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_hello peer v2:X.X.X.23:3300/0 says I am v2:X.X.X.24:43450/0 (socket says X.X.X.24:43450) 2026-05-29T16:40:04.546+0000 7f20c01b6640 1 --2- >> [v2:X.X.X.24:3300/0,v1:X.X.X.24:6789/0] conn(0x55dc97d56000 0x55dc97d583f0 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_hello peer v2:X.X.X.24:3300/0 says I am v2:X.X.X.24:58436/0 (socket says X.X.X.24:58436) 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 -- X.X.X.24:0/410333362 learned_addr learned my addr X.X.X.24:0/410333362 (peer_addr_for_me v2:X.X.X.24:0/0) 2026-05-29T16:40:04.546+0000 7f20be72c640 1 --2- X.X.X.24:0/410333362 >> [v2:X.X.X.22:3300/0,v1:X.X.X.22:6789/0] conn(0x55dc97d4d370 0x55dc97d55ac0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0 2026-05-29T16:40:04.546+0000 7f20c04670c0 1 -- --> [v2:X.X.X.22:3300/0,v1:X.X.X.22:6789/0] -- mon_getmap magic: 0 -- 0x55dc97b5a1b0 con 0x55dc97d4d370 2026-05-29T16:40:04.546+0000 7f20c04670c0 1 -- X.X.X.24:0/410333362 --> [v2:X.X.X.23:3300/0,v1:X.X.X.23:6789/0] -- mon_getmap magic: 0 -- 0x55dc97bae2b0 con 0x55dc97d4ca50 2026-05-29T16:40:04.546+0000 7f20c04670c0 1 -- X.X.X.24:0/410333362 --> [v2:X.X.X.24:3300/0,v1:X.X.X.24:6789/0] -- mon_getmap magic: 0 -- 0x55dc97baa1e0 con 0x55dc97d56000 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 -- X.X.X.24:0/410333362 >> [v2:X.X.X.24:3300/0,v1:X.X.X.24:6789/0] conn(0x55dc97d56000 msgr2=0x55dc97d583f0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 --2- X.X.X.24:0/410333362 >> [v2:X.X.X.24:3300/0,v1:X.X.X.24:6789/0] conn(0x55dc97d56000 0x55dc97d583f0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 -- X.X.X.24:0/410333362 >> [v2:X.X.X.22:3300/0,v1:X.X.X.22:6789/0] conn(0x55dc97d4d370 msgr2=0x55dc97d55ac0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 --2- X.X.X.24:0/410333362 >> [v2:X.X.X.22:3300/0,v1:X.X.X.22:6789/0] conn(0x55dc97d4d370 0x55dc97d55ac0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop 2026-05-29T16:40:04.546+0000 7f20bef2d640 1 -- X.X.X.24:0/410333362 --> [v2:X.X.X.23:3300/0,v1:X.X.X.23:6789/0] -- mon_subscribe({config=0+,monmap=0+}) -- 0x55dc97b76560 con 0x55dc97d4ca50 2026-05-29T16:40:04.546+0000 7f20c01b6640 1 --2- X.X.X.24:0/410333362 >> [v2:X.X.X.24:3300/0,v1:X.X.X.24:6789/0] conn(0x55dc97d56000 0x55dc97d583f0 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_auth_done state changed! 2026-05-29T16:40:04.550+0000 7f20bef2d640 1 --2- X.X.X.24:0/410333362 >> [v2:X.X.X.23:3300/0,v1:X.X.X.23:6789/0] conn(0x55dc97d4ca50 0x55dc97d4ce30 secure :-1 s=READY pgs=9780425 cs=0 l=1 rev1=1 crypto rx=0x7f20b400a600 tx=0x7f20b4058590 comp rx=0 tx=0).ready entity=mon.2 client_cookie=febb512c7cb313ca server_cookie=0 in_seq=0 out_seq=0 2026-05-29T16:40:04.550+0000 7f20bdf2b640 1 -- X.X.X.24:0/410333362 <== mon.2 v2:X.X.X.23:3300/0 1 ==== mon_map magic: 0 ==== 485+0+0 (secure 0 0 0) 0x7f20b406d070 con 0x55dc97d4ca50 2026-05-29T16:40:04.550+0000 7f20bdf2b640 1 -- X.X.X.24:0/410333362 <== mon.2 v2:X.X.X.23:3300/0 2 ==== config(19 keys) ==== 771+0+0 (secure 0 0 0) 0x7f20b4060cd0 con 0x55dc97d4ca50 2026-05-29T16:40:04.550+0000 7f20bdf2b640 1 -- X.X.X.24:0/410333362 <== mon.2 v2:X.X.X.23:3300/0 3 ==== mon_map magic: 0 ==== 485+0+0 (secure 0 0 0) 0x7f20b4068920 con 0x55dc97d4ca50
Rolling back to group snapshot: 0% complete...failed.
rbd: rollback group to snapshot failed: (22) Invalid argument

Zitat von Eugen Block <[email protected]>:

The commands were:

controller02:~# rbd --id user group create images/test-servers

controller02:~# for i in 0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd 72f5816c-c1db-44de-b0a2-19d661faa963 47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc; do rbd --id user group image add images/test-servers images/${i}_disk; done

controller02:~# rbd --id user group image ls images/test-servers
images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk
images/47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc_disk
images/72f5816c-c1db-44de-b0a2-19d661faa963_disk

controller02:~# rbd --id user group snap create images/test-servers@snap1

controller02:~# rbd --id user group snap ls images/test-servers
NAME   STATUS
snap1      ok


# rollback works for all images
controller02:~# rbd --id user group snap rollback images/test-servers@snap1
Rolling back to group snapshot: 100% complete...done.

# removing one image from the group
controller02:~# rbd --id user group image rm images/test-servers images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk

# rollback fails
controller02:~# rbd --id user group snap rollback images/test-servers@snap1
Rolling back to group snapshot: 0% complete...failed.
rbd: rollback group to snapshot failed: (22) Invalid argument

I'll add the debug output later, will need to sanitze it first. But I don't see anything obvious in there.

Zitat von Ilya Dryomov <[email protected]>:

On Fri, May 29, 2026 at 4:05 PM Eugen Block <[email protected]> wrote:

Hi,

thanks for your quick reply. No I didn't see any additional output
than the one I shared (invalid argument). I could add debug log level
if necessary.

That error message should have been displayed no matter the log level,
so something other than https://tracker.ceph.com/issues/66300 might be
involved.

What exactly do you mean by "I removed an image from the group
snapshot"?  Which commands were run there and in what order?

But one more detail, I also tried the rollback directly within the
cephadm shell (so version 19.2.3) with the same result:

ceph03:~ # cephadm shell
...
[ceph: root@ceph03 /]# rbd group snap rollback
images/test-servers@20260430_start
Rolling back to group snapshot: 0% complete...failed.
rbd: rollback group to snapshot failed: (22) Invalid argument

[ceph: root@ceph03 /]# ceph -v
ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable)

Can you try appending --debug-ms 1 --debug-rbd 20 to the command
(let's stick to this cephadm shell) and attach the output?

Thanks,

               Ilya


Thanks!
Eugen

Zitat von Ilya Dryomov <[email protected]>:

On Fri, May 29, 2026 at 2:33 PM Eugen Block via ceph-users
<[email protected]> wrote:

Hi,

I wanted to rollback a group snapshot on Ubuntu 24.04 (rbd client
version 19.2.1), the Ceph cluster version is 19.2.3. The client fails
with "invalid argument":

controller02:~# rbd --id <user> group snap rollback --pool images
--group test-servers --snap 20260430_start
Rolling back to group snapshot: 0% complete...failed.
rbd: rollback group to snapshot failed: (22) Invalid argument

controller02:~# ceph -v
ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28)
squid (stable)

But running the same command (just as admin not as <user>) on a Ceph
node works:

ceph03:~ # rbd group snap rollback --pool images --group test-servers
--snap 20260430_start
Rolling back to group snapshot: 100% complete...done.

ceph03:~ # ceph -v
ceph version 16.2.13-66-g54799ee0666
(54799ee06669271880ee5fc715f99202002aa371) pacific (stable)


What seems to be the issue here is that I removed an image from the
group snapshot. I wonder if it could be this bug [0] which is supposed
to be fixed in 19.2.0 according to the "Released In" field of the
Squid backport tracker [1].

This seems a little inconsistent to me, could someone please clarify?

Hi Eugen,

Did you see "group snapshot membership does not match group membership"
error message when the rollback command failed with 19.2.1 client?

Thanks,

                Ilya





_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to