Hi, I second that. Exporting RGW configuration periodically (rados -p .rgw.root export export.bin) will for sure help to recover from such situations where RGW configuration has been altered and radosgw-admin command can't fix the configuration ('couldn't init storage provider').
I've created this tracker [1]. Hopefully, adding verification steps to radosgw-admin will prevent such situations in the future. Regards, Frédéric. [1] https://tracker.ceph.com/issues/71406 ----- Le 21 Mai 25, à 15:20, Tobias Urdin - Binero tobias.ur...@binero.com a écrit : > Just my random 2 cents. > > I feel like I’ve hit the same issues when updating realm periods some years > ago > when testing > bucket replication and replication in general, it makes me sad that it’s still > an issue I remember > having to recreate things multiple times when hitting this. > > I was hoping this was not a problem this far in the future, if I were to need > to > change > our realms/zonegroups/add zones etc today I would probably backup the > .rgw.root > pool to have some kind of possibility to revert the pool, even though that > might > be > bad as well. > > /Tobias > >> On 21 May 2025, at 15:16, Michel Jouvin <michel.jou...@ijclab.in2p3.fr> >> wrote: >> >> Hi, >> >> An update on this issue. Thanks to suggestions from Frédéric Nass, I think I >> managed to clear the problem by deleting the realm and all its objects >> (zonegroup, zone, period) with radosgw-admin and deleting the pools >> associated >> with the deleted zone. I am sure it is not a general solution for this >> problem >> that I was able to reproduce on a test cluster. I've the feeling that >> radosgw-admin should make a better job to avoid creating such a mess when >> deleting zones but it is another story. The reasons why deleting the realm >> and >> its objects worked for us include: >> >> - The realm/zonegroup/zone was just created and there was no useful content >> in >> it so loosing everything related to it was an option as said previously (but >> deleting .rgw.root was not an option as we have several realms in >> production). >> >> - We configure each realm/zonegroup/zone with a separate set of RGW (that >> can be >> deployed on the same server by cephadm but it is another story) so the only >> RGW >> impacts are those related to the deleted realm. >> >> - Our realm was monosite. After deleting the realm, it is not possible to >> push >> (commit) the change to other zonegroup/zones of the realm as the realm must >> exist to be able to commit a new period. I guess that in a multisite >> configuration, it means that the cleanup operation must be done in all the >> clusters involved in the multisite configuration. >> >> Best regards, >> >> Michel >> >> Le 14/05/2025 à 18:12, Michel Jouvin a écrit : >>> Hi, >>> >>> We are still stucked with this problem and I have not seen an answer to my >>> previous emails. We found in the doc the explanation of the problem: >>> https://docs.ceph.com/en/latest/radosgw/multisite/#deleting-a-zone. But the >>> doc >>> does not mention the way out of the problem... If we delete the realm would >>> it >>> help? There is no content in this realm/zonegroup/zone so removing >>> everything >>> is an option if it helps. >>> >>> Thanks in advance for any hint. Best regards, >>> >>> Michel >>> Sent from my mobile >>> >>> Le 7 mai 2025 16:49:19 Michel Jouvin <michel.jou...@ijclab.in2p3.fr> a >>> écrit : >>> >>>> Hi, >>>> >>>> I managed to find what where the zone and zonegroup ID before they were >>>> deleted and I confirm that those referred into the error messages are >>>> the ID of the deleted zone and zonegroup. The new zone and zonegroup >>>> (which have the same name, again not sure if it is a problem as >>>> everything should be done by ID, isn't it) have been defined as master >>>> zone and zonegroup, so the other ones should just be deleted, isn't it? >>>> I really don't understand what the error means and what can be done to >>>> fix it. >>>> >>>> Best regards, >>>> >>>> Michel >>>> >>>> Le 06/05/2025 à 21:29, Michel Jouvin a écrit : >>>>> Hi, >>>>> >>>>> It is not the first time that after doing configuration changes in >>>>> RADOS for a realm/zonegroup/zone with radosgw-admin, we get errors >>>>> when trying to do a "period update --commit". We never found a good >>>>> documentation on how to fix these problems, up to now we always >>>>> managed at some point to restore a good configuration that can be >>>>> commited but it is probably time for us to have a more informed approach! >>>>> >>>>> Last occurence of the problem happened today with a >>>>> realm/zonegroup/zone created recently. Trying to fix a problem with >>>>> the non working haproxy associated with it, one of my colleagues >>>>> decided to delete and recreate the zone and zonegroup (with the same >>>>> names). The related commands worked but since it has been done any >>>>> attempt to do "period update --commit" results in the following error: >>>>> >>>>> ------- >>>>> >>>>> 2025-05-06T11:56:20.939+0200 7fdc7d41da80 0 failed reading obj info >>>>> from .rgw.root:zone_info.93af6e0c-4552-4c2e-b167-36114a5a81e4: (2) No >>>>> such file or directory >>>>> 2025-05-06T11:56:20.945+0200 7fdc7d41da80 0 failed reading obj info >>>>> from .rgw.root:zonegroup_info.d7221099-4e7d-43cb-a1e8-28a750de1cd5: >>>>> (2) No such file or directory >>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 failed reading obj info >>>>> from .rgw.root:zone_info.93af6e0c-4552-4c2e-b167-36114a5a81e4: (2) No >>>>> such file or directory >>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 -1 Cannot find zone >>>>> id=93af6e0c-4552-4c2e-b167-36114a5a81e4 (name=default) >>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 ERROR: failed to start >>>>> notify service ((22) Invalid argument >>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 ERROR: failed to init >>>>> services (ret=(22) Invalid argument) >>>>> couldn't init storage provider >>>>> ------- >>>>> >>>>> I have the feeling that it is related to the delete objects that are >>>>> no longer found but it is not completely clear what is the way out of >>>>> it? Is the problem related to recreating the zone/zonegroup with the >>>>> same names? There are several realms already in production so we >>>>> cannot do a .rgw.root reset but this particular realm has never been >>>>> put in production so we can delete everything related to it. >>>>> >>>>> Thanks in advance for any hint or pointer. Best regards, >>>>> >>>>> Michel >>>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@ceph.io >>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io