On Fri, Aug 14, 2020 at 9:25 AM Alex Hussein-Kershaw
<[email protected]> wrote:
>
> Hi,
>
> I've previously discussed some issues I've had with the RGW lifecycle 
> processing. I've discovered that the root cause of my problem is that:
>
>   *   I'm running a multisite configuration
>      *   Life cycle processing is done on the master site each night. 
> `radosgw-admin lc list` correctly returns all buckets with lc config.
>   *   I simulate the master site being destroyed from my VM host.
>   *   I promote the secondary site to master following the instructions here: 
>  https://docs.ceph.com/docs/master/radosgw/multisite/
>      *   The new master site isn't doing any lifecycle processing. 
> `radosgw-admin lc list` returns empty.
>   *   I recreate a cluster and pair it with the new master site to get back 
> to having multisite redundancy.
>      *   Neither site is doing any lifecycle processing. `radosgw-admin lc 
> list` returns empty.
> So in the process of failover/recovery I have gone from having two paired 
> clusters performing lifecycle processing, to two paired clusters NOT 
> performing lifecycle processing.
>
> Is this behaviour expected? I've found `radosgw-admin lc reshard fix` will 
> "remind" the cluster that I run it on that it needs to do lifecycle 
> processing. Although I found no mention of having to use this in the docs, 
> for that command the docs state it's only relevant on earlier Ceph versions. 
> I'm running Nautilus 14.2.9.
>
> In addition, if I have two healthy clusters paired in a multisite system, and 
> swap the master cluster by promoting the non-master, the demoted cluster 
> seems to still continue doing lifecycle processing, while the promote does 
> not. If I run `radosgw-admin lc reshard fix` on the promoted cluster, then 
> both clusters seem to claim they are doing the processing. Is this a happy 
> state to be in?
>
> Does anyone have any experience with this?
>
> Thanks,
> Alex
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

There's a defect in metadata sync
(https://tracker.ceph.com/issues/44268) which prevents buckets with
lifecycle policies from being indexed for lifecycle processing on
non-master zones. It sounds like the 'lc reshard fix' command is
adding it back to that index for processing.

The intent is for lifecycle processing to occur independently on every
zone. That's the only way to guarantee the correct result now that we
have PutBucketReplication (and specifically the Filter policy) where
any given zone may only hold a subset of the objects from its source.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to