Hi,

I second that. Exporting RGW configuration periodically (rados -p .rgw.root 
export export.bin) will for sure help to recover from such situations where RGW 
configuration has been altered and radosgw-admin command can't fix the 
configuration ('couldn't init storage provider').

I've created this tracker [1]. Hopefully, adding verification steps to 
radosgw-admin will prevent such situations in the future.

Regards,
Frédéric.

[1] https://tracker.ceph.com/issues/71406

----- Le 21 Mai 25, à 15:20, Tobias Urdin - Binero tobias.ur...@binero.com a 
écrit :

> Just my random 2 cents.
> 
> I feel like I’ve hit the same issues when updating realm periods some years 
> ago
> when testing
> bucket replication and replication in general, it makes me sad that it’s still
> an issue I remember
> having to recreate things multiple times when hitting this.
> 
> I was hoping this was not a problem this far in the future, if I were to need 
> to
> change
> our realms/zonegroups/add zones etc today I would probably backup the 
> .rgw.root
> pool to have some kind of possibility to revert the pool, even though that 
> might
> be
> bad as well.
> 
> /Tobias
> 
>> On 21 May 2025, at 15:16, Michel Jouvin <michel.jou...@ijclab.in2p3.fr> 
>> wrote:
>>
>> Hi,
>>
>> An update on this issue. Thanks to suggestions from Frédéric Nass, I think I
>> managed to clear the problem by deleting the realm and all its objects
>> (zonegroup, zone, period) with radosgw-admin and deleting the pools 
>> associated
>> with the deleted zone. I am sure it is not a general solution for this 
>> problem
>> that I was able to reproduce on  a test cluster. I've the feeling that
>> radosgw-admin should make a better job to avoid creating such a mess when
>> deleting zones but it is another story. The reasons why deleting the realm 
>> and
>> its objects worked for us include:
>>
>> - The realm/zonegroup/zone was just created and there was no useful content 
>> in
>> it so loosing everything related to it was an option as said previously (but
>> deleting .rgw.root was not an option as we have several realms in 
>> production).
>>
>> - We configure each realm/zonegroup/zone with a separate set of RGW (that 
>> can be
>> deployed on the same server by cephadm but it is another story) so the only 
>> RGW
>> impacts are those related to the deleted realm.
>>
>> - Our realm was monosite. After deleting the realm, it is not possible to 
>> push
>> (commit) the change to other zonegroup/zones of the realm as the realm must
>> exist to be able to commit a new period. I guess that in a multisite
>> configuration, it means that the cleanup operation must be done in all the
>> clusters involved in the multisite configuration.
>>
>> Best regards,
>>
>> Michel
>>
>> Le 14/05/2025 à 18:12, Michel Jouvin a écrit :
>>> Hi,
>>>
>>> We are still stucked with this problem and I have not seen an answer to my
>>> previous emails. We found in the doc the explanation of the problem:
>>> https://docs.ceph.com/en/latest/radosgw/multisite/#deleting-a-zone. But the 
>>> doc
>>> does not mention the way out of the problem... If we delete the realm would 
>>> it
>>> help? There is no content in this realm/zonegroup/zone so removing 
>>> everything
>>> is an option if it helps.
>>>
>>> Thanks in advance for any hint. Best regards,
>>>
>>> Michel
>>> Sent from my mobile
>>>
>>> Le 7 mai 2025 16:49:19 Michel Jouvin <michel.jou...@ijclab.in2p3.fr> a 
>>> écrit :
>>>
>>>> Hi,
>>>>
>>>> I managed to find what where the zone and zonegroup ID before they were
>>>> deleted and I confirm that those referred into the error messages are
>>>> the ID of the deleted zone and zonegroup. The new zone and zonegroup
>>>> (which have the same name, again not sure if it is a problem as
>>>> everything should be done by ID, isn't it) have been defined as master
>>>> zone and zonegroup, so the other ones should just be deleted, isn't it?
>>>> I really don't understand what the error means and what can be done to
>>>> fix it.
>>>>
>>>> Best regards,
>>>>
>>>> Michel
>>>>
>>>> Le 06/05/2025 à 21:29, Michel Jouvin a écrit :
>>>>> Hi,
>>>>>
>>>>> It is not the first time that after doing configuration changes in
>>>>> RADOS for a realm/zonegroup/zone with radosgw-admin, we get errors
>>>>> when trying to do a "period update --commit". We never found a good
>>>>> documentation on how to fix these problems, up to now we always
>>>>> managed at some point to restore a good configuration that can be
>>>>> commited but it is probably time for us to have a more informed approach!
>>>>>
>>>>> Last occurence of the problem happened today with a
>>>>> realm/zonegroup/zone created recently. Trying to fix a problem with
>>>>> the non working haproxy associated with it, one of my colleagues
>>>>> decided to delete and recreate the zone and zonegroup (with the same
>>>>> names). The related commands worked but since it has been done any
>>>>> attempt to do "period update --commit" results in the following error:
>>>>>
>>>>> -------
>>>>>
>>>>> 2025-05-06T11:56:20.939+0200 7fdc7d41da80 0 failed reading obj info
>>>>> from .rgw.root:zone_info.93af6e0c-4552-4c2e-b167-36114a5a81e4: (2) No
>>>>> such file or directory
>>>>> 2025-05-06T11:56:20.945+0200 7fdc7d41da80 0 failed reading obj info
>>>>> from .rgw.root:zonegroup_info.d7221099-4e7d-43cb-a1e8-28a750de1cd5:
>>>>> (2) No such file or directory
>>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 failed reading obj info
>>>>> from .rgw.root:zone_info.93af6e0c-4552-4c2e-b167-36114a5a81e4: (2) No
>>>>> such file or directory
>>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 -1 Cannot find zone
>>>>> id=93af6e0c-4552-4c2e-b167-36114a5a81e4 (name=default)
>>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 ERROR: failed to start
>>>>> notify service ((22) Invalid argument
>>>>> 2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 ERROR: failed to init
>>>>> services (ret=(22) Invalid argument)
>>>>> couldn't init storage provider
>>>>> -------
>>>>>
>>>>> I have the feeling that it is related to the delete objects that are
>>>>> no longer found but it is not completely clear what is the way out of
>>>>> it? Is the problem related to recreating the zone/zonegroup with the
>>>>> same names? There are several realms already in production so we
>>>>> cannot do a .rgw.root reset but this particular realm has never been
>>>>> put in production so we can delete everything related to it.
>>>>>
>>>>> Thanks in advance for any hint or pointer. Best regards,
>>>>>
>>>>> Michel
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to