Hi Ben,

Are you compacting the relevant osds periodically? ceph tell osd.x
compact (for the three osds holding the bilog) would help reshape the
rocksdb levels to least perform better for a little while until the
next round of bilog trims.

Otherwise, I have experience deleting ~50M object indices in one step
in the past, probably back in the luminous days IIRC. It will likely
lockup the relevant osds for a while while the omap is removed. If you
dare take that step, it might help to set nodown; that might prevent
other osds from flapping and creating more work.

Cheers, Dan

______________________________
Clyso GmbH | https://www.clyso.com


On Tue, Apr 25, 2023 at 2:45 PM Ben.Zieglmeier
<ben.zieglme...@target.com> wrote:
>
> Hi All,
>
> We have a RGW cluster running Luminous (12.2.11) that has one object with an 
> extremely large OMAP database in the index pool. Listomapkeys on the object 
> returned 390 Million keys to start. Through bilog trim commands, we’ve 
> whittled that down to about 360 Million. This is a bucket index for a 
> regrettably unsharded bucket. There are only about 37K objects actually in 
> the bucket, but through years of neglect, the bilog grown completely out of 
> control. We’ve hit some major problems trying to deal with this particular 
> OMAP object. We just crashed 4 OSDs when a bilog trim caused enough churn to 
> knock one of the OSDs housing this PG out of the cluster temporarily. The OSD 
> disks are 6.4TB NVMe, but are split into 4 partitions, each housing their own 
> OSD daemon (collocated journal).
>
> We want to be rid of this large OMAP object, but are running out of options 
> to deal with it. Reshard outright does not seem like a viable option, as we 
> believe the deletion would deadlock OSDs can could cause impact. Continuing 
> to run `bilog trim` 1000 records at a time has been what we’ve done, but this 
> also seems to be creating impacts to performance/stability. We are seeking 
> options to remove this problematic object without creating additional 
> problems. It is quite likely this bucket is abandoned, so we could remove the 
> data, but I fear even the deletion of such a large OMAP could bring OSDs down 
> and cause potential for metadata loss (the other bucket indexes on that same 
> PG).
>
> Any insight available would be highly appreciated.
>
> Thanks.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to