Hi Ben, Are you compacting the relevant osds periodically? ceph tell osd.x compact (for the three osds holding the bilog) would help reshape the rocksdb levels to least perform better for a little while until the next round of bilog trims.
Otherwise, I have experience deleting ~50M object indices in one step in the past, probably back in the luminous days IIRC. It will likely lockup the relevant osds for a while while the omap is removed. If you dare take that step, it might help to set nodown; that might prevent other osds from flapping and creating more work. Cheers, Dan ______________________________ Clyso GmbH | https://www.clyso.com On Tue, Apr 25, 2023 at 2:45 PM Ben.Zieglmeier <ben.zieglme...@target.com> wrote: > > Hi All, > > We have a RGW cluster running Luminous (12.2.11) that has one object with an > extremely large OMAP database in the index pool. Listomapkeys on the object > returned 390 Million keys to start. Through bilog trim commands, we’ve > whittled that down to about 360 Million. This is a bucket index for a > regrettably unsharded bucket. There are only about 37K objects actually in > the bucket, but through years of neglect, the bilog grown completely out of > control. We’ve hit some major problems trying to deal with this particular > OMAP object. We just crashed 4 OSDs when a bilog trim caused enough churn to > knock one of the OSDs housing this PG out of the cluster temporarily. The OSD > disks are 6.4TB NVMe, but are split into 4 partitions, each housing their own > OSD daemon (collocated journal). > > We want to be rid of this large OMAP object, but are running out of options > to deal with it. Reshard outright does not seem like a viable option, as we > believe the deletion would deadlock OSDs can could cause impact. Continuing > to run `bilog trim` 1000 records at a time has been what we’ve done, but this > also seems to be creating impacts to performance/stability. We are seeking > options to remove this problematic object without creating additional > problems. It is quite likely this bucket is abandoned, so we could remove the > data, but I fear even the deletion of such a large OMAP could bring OSDs down > and cause potential for metadata loss (the other bucket indexes on that same > PG). > > Any insight available would be highly appreciated. > > Thanks. > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io