Re: [ceph-users] Resolving Large omap objects in RGW index pool

Tomasz Płaza Wed, 17 Oct 2018 04:12:02 -0700

Hi,

I have a similar issue, and created a simple bash file to delete oldindexes (it is PoC and have not been tested on production):


for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`
do
  actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'`

for instance in `radosgw-admin metadata list bucket.instance | jq -r'.[]' | grep ${bucket}: | cut -d ':' -f 2`

  do
    if [ "$actual_id" != "$instance" ]
    then
      radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance}
      radosgw-admin metadata rm bucket.instance:${bucket}:${instance}
    fi
  done
done

I find it more readable than mentioned one liner. Any sugestions on thistopic are greatly appreciated.

Tom

Hi,
Having spent some time on the below issue, here are the steps I tookto resolve the "Large omap objects" warning. Hopefully this will helpothers who find themselves in this situation.
I got the object ID and OSD ID implicated from the ceph clusterlogfile on the mon. I then proceeded to the implicated hostcontaining the OSD, and extracted the implicated PG by running thefollowing, and looking at which PG had started and completed adeep-scrub around the warning being logged:
grep -C 200 Large /var/log/ceph/ceph-osd.*.log | egrep '(Largeomap|deep-scrub)'
If the bucket had not been sharded sufficiently (IE the cluster logshowed a "Key Count" or "Size" over the thresholds), I ran through themanual sharding procedure (shown here:https://tracker.ceph.com/issues/24457#note-5)
Once this was successfully sharded, or if the bucket was previouslysufficiently sharded by Ceph prior to disabling the functionality Iwas able to use the following command (seemingly undocumented forLuminous http://docs.ceph.com/docs/mimic/man/8/radosgw-admin/#commands):
radosgw-admin bi purge --bucket ${bucketname} --bucket-id ${old_bucket_id}
I then issued a ceph pg deep-scrub against the PG that had containedthe Large omap object.
Once I had completed this procedure, my Large omap object warningswent away and the cluster returned to HEALTH_OK.
However our radosgw bucket indexes pool now seems to be usingsubstantially more space than previously. Having looked initially atthis bug, and in particular the first comment:
http://tracker.ceph.com/issues/34307#note-1
I was able to extract a number of bucket indexes that had apparentlybeen resharded, and removed the legacy index using the radosgw-adminbi purge --bucket ${bucket} ${marker}. I am still able to perform aradosgw-admin metadata get bucket.instance:${bucket}:${marker}successfully, however now when I run rados -p .rgw.buckets.index ls |grep ${marker} nothing is returned. Even after this, we were stillseeing extremely high disk usage of our OSDs containing the bucketindexes (we have a dedicated pool for this). I then modified the oneliner referenced in the previous link as follows:
grep -E '"bucket"|"id"|"marker"' bucket-stats.out | awk -F ":"'{print $2}' | tr -d '",' | while read -r bucket; do read -r id; read-r marker; [ "$id" == "$marker" ] && true || NEWID=`radosgw-admin --idrgw.ceph-rgw-1 metadata get bucket.instance:${bucket}:${marker} |python -c 'import sys, json; printjson.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`;while [ ${NEWID} ]; do if [ "${NEWID}" != "${marker}" ] && [ ${NEWID}!= ${bucket} ] ; then echo "$bucket $NEWID"; fi; NEWID=`radosgw-admin--id rgw.ceph-rgw-1 metadata get bucket.instance:${bucket}:${NEWID} |python -c 'import sys, json; printjson.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`;done; done > buckets_with_multiple_reindexes2.txt
This loops through the buckets that have a different marker/bucket_id,and looks to see if a new_bucket_instance_id is there, and if so willloop through until there is no longer a "new_bucket_instance_id". After letting this complete, this suggests that I have over 5000indexes for 74 buckets, some of these buckets have > 100 indexesapparently.
:~# awk '{print $1}' buckets_with_multiple_reindexes2.txt | uniq | wc -l
74
~# wc -l buckets_with_multiple_reindexes2.txt
5813 buckets_with_multiple_reindexes2.txt
This is running a single realm, multiple zone configuration, and nomulti site sync, but the closest I can find to this issue is this bughttps://tracker.ceph.com/issues/24603
Should I be OK to loop through these indexes and remove any with areshard_status of 2, a new_bucket_instance_id that does not match thebucket_instance_id returned by the command:
radosgw-admin bucket stats --bucket ${bucket}
I'd ideally like to get to a point where I can turn dynamic shardingback on safely for this cluster.
Thanks for any assistance, let me know if there's any more informationI should provide
Chris
On Thu, 4 Oct 2018 at 18:22 Chris Sarginson <[email protected]<mailto:[email protected]>> wrote:
    Hi,

    Thanks for the response - I am still unsure as to what will happen
    to the "marker" reference in the bucket metadata, as this is the
    object that is being detected as Large.  Will the bucket generate
    a new "marker" reference in the bucket metadata?

    I've been reading this page to try and get a better understanding
    of this
    http://docs.ceph.com/docs/luminous/radosgw/layout/

    However I'm no clearer on this (and what the "marker" is used
    for), or why there are multiple separate "bucket_id" values (with
    different mtime stamps) that all show as having the same number of
    shards.

    If I were to remove the old bucket would I just be looking to execute

    rados - p .rgw.buckets.index rm .dir.default.5689810.107

    Is the differing marker/bucket_id in the other buckets that was
    found also an indicator?  As I say, there's a good number of
    these, here's some additional examples, though these aren't
    necessarily reporting as large omap objects:

    "BUCKET1", "default.281853840.479", "default.105206134.5",
    "BUCKET2", "default.364663174.1", "default.349712129.3674",

    Checking these other buckets, they are exhibiting the same sort of
    symptoms as the first (multiple instances of radosgw-admin
    metadata get showing what seem to be multiple resharding processes
    being run, with different mtimes recorded).

    Thanks
    Chris

    On Thu, 4 Oct 2018 at 16:21 Konstantin Shalygin <[email protected]
    <mailto:[email protected]>> wrote:
        Hi,

        Ceph version: Luminous 12.2.7

        Following upgrading to Luminous from Jewel we have been stuck with a
        cluster in HEALTH_WARN state that is complaining about large omap 
objects.
        These all seem to be located in our .rgw.buckets.index pool.  We've
        disabled auto resharding on bucket indexes due to seeming looping issues
        after our upgrade.  We've reduced the number reported of reported large
        omap objects by initially increasing the following value:

        ~# ceph daemon mon.ceph-mon-1 config get
        osd_deep_scrub_large_omap_object_value_sum_threshold
        {
             "osd_deep_scrub_large_omap_object_value_sum_threshold": "2147483648 
<tel:%28214%29%20748-3648>"
        }

        However we're still getting a warning about a single large OMAP object,
        however I don't believe this is related to an unsharded index - here's 
the
        log entry:

        2018-10-01 13:46:24.427213 osd.477 osd.477172.26.216.6:6804/2311858 
<http://172.26.216.6:6804/2311858>  8482 :
        cluster [WRN] Large omap object found. Object:
        15:333d5ad7:::.dir.default.5689810.107:head Key count: 17467251 Size
        (bytes):4458647149 <tel:%28445%29%20864-7149>

        The object in the logs is the "marker" object, rather than the 
bucket_id -
        I've put some details regarding the bucket here:

        https://pastebin.com/hW53kTxL

        The bucket limit check shows that the index is sharded, so I think this
        might be related to versioning, although I was unable to get 
confirmation
        that the bucket in question has versioning enabled through the aws
        cli(snipped debug output below)

        2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - 
Response
        headers: {'date': 'Tue, 02 Oct 2018 14:11:17 GMT', 'content-length': 
'137',
        'x-amz-request-id': 
'tx0000000000000020e3b15-005bb37c85-15870fe0-default',
        'content-type': 'application/xml'}
        2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - 
Response
        body:
        <?xml version="1.0" encoding="UTF-8"?><VersioningConfiguration xmlns="
        http://s3.amazonaws.com/doc/2006-03-01/";></VersioningConfiguration>

        After dumping the contents of large omap object mentioned above into a 
file
        it does seem to be a simple listing of the bucket contents, potentially 
an
        old index:

        ~# wc -l omap_keys
        17467251 omap_keys

        This is approximately 5 million below the currently reported number of
        objects in the bucket.

        When running the commands listed here:
        http://tracker.ceph.com/issues/34307#note-1

        The problematic bucket is listed in the output (along with 72 other
        buckets):
        "CLIENTBUCKET", "default.294495648.690", "default.5689810.107"

        As this tests for bucket_id and marker fields not matching to print out 
the
        information, is the implication here that both of these should match in
        order to fully migrate to the new sharded index?

        I was able to do a "metadata get" using what appears to be the old index
        object ID, which seems to support this (there's a 
"new_bucket_instance_id"
        field, containing a newer "bucket_id" and reshard_status is 2, which 
seems
        to suggest it has completed).

        I am able to take the "new_bucket_instance_id" and get additional 
metadata
        about the bucket, each time I do this I get a slightly newer
        "new_bucket_instance_id", until it stops suggesting updated indexes.

        It's probably worth pointing out that when going through this process 
the
        final "bucket_id" doesn't match the one that I currently get when 
running
        'radosgw-admin bucket stats --bucket "CLIENTBUCKET"', even though it 
also
        suggests that no further resharding has been done as "reshard_status" = 0
        and "new_bucket_instance_id" is blank.  The output is available to view
        here:

        https://pastebin.com/g1TJfKLU

        It would be useful if anyone can offer some clarification on how to 
proceed
        from this situation, identifying and removing any old/stale indexes from
        the index pool (if that is the case), as I've not been able to spot
        anything in the archives.

        If there's any further information that is needed for additional context
        please let me know.
        Usually, when you bucket is automatically resharded in some
        case old big index is not deleted - this is your large omap
        object.

        This index is safe to delete. Also look at [1].


        [1] https://tracker.ceph.com/issues/24457



        k



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Resolving Large omap objects in RGW index pool

Reply via email to