Hi, Having spent some time on the below issue, here are the steps I took to resolve the "Large omap objects" warning. Hopefully this will help others who find themselves in this situation.
I got the object ID and OSD ID implicated from the ceph cluster logfile on the mon. I then proceeded to the implicated host containing the OSD, and extracted the implicated PG by running the following, and looking at which PG had started and completed a deep-scrub around the warning being logged: grep -C 200 Large /var/log/ceph/ceph-osd.*.log | egrep '(Large omap|deep-scrub)' If the bucket had not been sharded sufficiently (IE the cluster log showed a "Key Count" or "Size" over the thresholds), I ran through the manual sharding procedure (shown here: https://tracker.ceph.com/issues/24457#note-5 ) Once this was successfully sharded, or if the bucket was previously sufficiently sharded by Ceph prior to disabling the functionality I was able to use the following command (seemingly undocumented for Luminous http://docs.ceph.com/docs/mimic/man/8/radosgw-admin/#commands): radosgw-admin bi purge --bucket ${bucketname} --bucket-id ${old_bucket_id} I then issued a ceph pg deep-scrub against the PG that had contained the Large omap object. Once I had completed this procedure, my Large omap object warnings went away and the cluster returned to HEALTH_OK. However our radosgw bucket indexes pool now seems to be using substantially more space than previously. Having looked initially at this bug, and in particular the first comment: http://tracker.ceph.com/issues/34307#note-1 I was able to extract a number of bucket indexes that had apparently been resharded, and removed the legacy index using the radosgw-admin bi purge --bucket ${bucket} ${marker}. I am still able to perform a radosgw-admin metadata get bucket.instance:${bucket}:${marker} successfully, however now when I run rados -p .rgw.buckets.index ls | grep ${marker} nothing is returned. Even after this, we were still seeing extremely high disk usage of our OSDs containing the bucket indexes (we have a dedicated pool for this). I then modified the one liner referenced in the previous link as follows: grep -E '"bucket"|"id"|"marker"' bucket-stats.out | awk -F ":" '{print $2}' | tr -d '",' | while read -r bucket; do read -r id; read -r marker; [ "$id" == "$marker" ] && true || NEWID=`radosgw-admin --id rgw.ceph-rgw-1 metadata get bucket.instance:${bucket}:${marker} | python -c 'import sys, json; print json.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`; while [ ${NEWID} ]; do if [ "${NEWID}" != "${marker}" ] && [ ${NEWID} != ${bucket} ] ; then echo "$bucket $NEWID"; fi; NEWID=`radosgw-admin --id rgw.ceph-rgw-1 metadata get bucket.instance:${bucket}:${NEWID} | python -c 'import sys, json; print json.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`; done; done > buckets_with_multiple_reindexes2.txt This loops through the buckets that have a different marker/bucket_id, and looks to see if a new_bucket_instance_id is there, and if so will loop through until there is no longer a "new_bucket_instance_id". After letting this complete, this suggests that I have over 5000 indexes for 74 buckets, some of these buckets have > 100 indexes apparently. :~# awk '{print $1}' buckets_with_multiple_reindexes2.txt | uniq | wc -l 74 ~# wc -l buckets_with_multiple_reindexes2.txt 5813 buckets_with_multiple_reindexes2.txt This is running a single realm, multiple zone configuration, and no multi site sync, but the closest I can find to this issue is this bug https://tracker.ceph.com/issues/24603 Should I be OK to loop through these indexes and remove any with a reshard_status of 2, a new_bucket_instance_id that does not match the bucket_instance_id returned by the command: radosgw-admin bucket stats --bucket ${bucket} I'd ideally like to get to a point where I can turn dynamic sharding back on safely for this cluster. Thanks for any assistance, let me know if there's any more information I should provide Chris On Thu, 4 Oct 2018 at 18:22 Chris Sarginson <[email protected]> wrote: > Hi, > > Thanks for the response - I am still unsure as to what will happen to the > "marker" reference in the bucket metadata, as this is the object that is > being detected as Large. Will the bucket generate a new "marker" reference > in the bucket metadata? > > I've been reading this page to try and get a better understanding of this > http://docs.ceph.com/docs/luminous/radosgw/layout/ > > However I'm no clearer on this (and what the "marker" is used for), or why > there are multiple separate "bucket_id" values (with different mtime > stamps) that all show as having the same number of shards. > > If I were to remove the old bucket would I just be looking to execute > > rados - p .rgw.buckets.index rm .dir.default.5689810.107 > > Is the differing marker/bucket_id in the other buckets that was found also > an indicator? As I say, there's a good number of these, here's some > additional examples, though these aren't necessarily reporting as large > omap objects: > > "BUCKET1", "default.281853840.479", "default.105206134.5", > "BUCKET2", "default.364663174.1", "default.349712129.3674", > > Checking these other buckets, they are exhibiting the same sort of > symptoms as the first (multiple instances of radosgw-admin metadata get > showing what seem to be multiple resharding processes being run, with > different mtimes recorded). > > Thanks > Chris > > On Thu, 4 Oct 2018 at 16:21 Konstantin Shalygin <[email protected]> wrote: > >> Hi, >> >> Ceph version: Luminous 12.2.7 >> >> Following upgrading to Luminous from Jewel we have been stuck with a >> cluster in HEALTH_WARN state that is complaining about large omap objects. >> These all seem to be located in our .rgw.buckets.index pool. We've >> disabled auto resharding on bucket indexes due to seeming looping issues >> after our upgrade. We've reduced the number reported of reported large >> omap objects by initially increasing the following value: >> >> ~# ceph daemon mon.ceph-mon-1 config get >> osd_deep_scrub_large_omap_object_value_sum_threshold >> { >> "osd_deep_scrub_large_omap_object_value_sum_threshold": "2147483648 >> <(214)%20748-3648>" >> } >> >> However we're still getting a warning about a single large OMAP object, >> however I don't believe this is related to an unsharded index - here's the >> log entry: >> >> 2018-10-01 13:46:24.427213 osd.477 osd.477 172.26.216.6:6804/2311858 8482 : >> cluster [WRN] Large omap object found. Object: >> 15:333d5ad7:::.dir.default.5689810.107:head Key count: 17467251 Size >> (bytes): 4458647149 <(445)%20864-7149> >> >> The object in the logs is the "marker" object, rather than the bucket_id - >> I've put some details regarding the bucket here: >> https://pastebin.com/hW53kTxL >> >> The bucket limit check shows that the index is sharded, so I think this >> might be related to versioning, although I was unable to get confirmation >> that the bucket in question has versioning enabled through the aws >> cli(snipped debug output below) >> >> 2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response >> headers: {'date': 'Tue, 02 Oct 2018 14:11:17 GMT', 'content-length': '137', >> 'x-amz-request-id': 'tx0000000000000020e3b15-005bb37c85-15870fe0-default', >> 'content-type': 'application/xml'} >> 2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response >> body: >> <?xml version="1.0" encoding="UTF-8"?><VersioningConfiguration >> xmlns="http://s3.amazonaws.com/doc/2006-03-01/"></VersioningConfiguration> >> >> After dumping the contents of large omap object mentioned above into a file >> it does seem to be a simple listing of the bucket contents, potentially an >> old index: >> >> ~# wc -l omap_keys >> 17467251 omap_keys >> >> This is approximately 5 million below the currently reported number of >> objects in the bucket. >> >> When running the commands listed >> here:http://tracker.ceph.com/issues/34307#note-1 >> >> The problematic bucket is listed in the output (along with 72 other >> buckets): >> "CLIENTBUCKET", "default.294495648.690", "default.5689810.107" >> >> As this tests for bucket_id and marker fields not matching to print out the >> information, is the implication here that both of these should match in >> order to fully migrate to the new sharded index? >> >> I was able to do a "metadata get" using what appears to be the old index >> object ID, which seems to support this (there's a "new_bucket_instance_id" >> field, containing a newer "bucket_id" and reshard_status is 2, which seems >> to suggest it has completed). >> >> I am able to take the "new_bucket_instance_id" and get additional metadata >> about the bucket, each time I do this I get a slightly newer >> "new_bucket_instance_id", until it stops suggesting updated indexes. >> >> It's probably worth pointing out that when going through this process the >> final "bucket_id" doesn't match the one that I currently get when running >> 'radosgw-admin bucket stats --bucket "CLIENTBUCKET"', even though it also >> suggests that no further resharding has been done as "reshard_status" = 0 >> and "new_bucket_instance_id" is blank. The output is available to view >> here: >> https://pastebin.com/g1TJfKLU >> >> It would be useful if anyone can offer some clarification on how to proceed >> from this situation, identifying and removing any old/stale indexes from >> the index pool (if that is the case), as I've not been able to spot >> anything in the archives. >> >> If there's any further information that is needed for additional context >> please let me know. >> >> >> Usually, when you bucket is automatically resharded in some case old big >> index is not deleted - this is your large omap object. >> >> This index is safe to delete. Also look at [1]. >> >> >> [1] https://tracker.ceph.com/issues/24457 >> >> >> >> k >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
