Re: [ceph-users] radosgw sync falling behind regularly

Christian Rice Tue, 05 Mar 2019 11:07:27 -0800

Matthew, first of all, let me say we very much appreciate your help!

So I don’t think we turned dynamic resharding on, nor did we manually reshard 
buckets.  Seems like it defaults to on for luminous but the mimic docs say it’s 
not supported in multisite.  So do we need to disable it manually via tell and 
ceph.conf?


Also, after running the command you suggested, all the stale instances are 
gone…these from my examples were in output:
        "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",
        "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",
        "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",

Though we still get lots of log messages like so in rgw:

2019-03-05 11:01:09.526120 7f64120ae700  0 ERROR: failed to get bucket instance 
info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.528664 7f63e5016700  1 civetweb: 0x55976f1c2000: 
172.17.136.17 - - [05/Mar/2019:10:54:06 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299&rgwx-zonegroup=de6af748-1a2f-44a1-9d44-30799cf1313e
 HTTP/1.1" 404 0 - -
2019-03-05 11:01:09.529648 7f64130b0700  0 meta sync: ERROR: can't remove key: 
bucket.instance:sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
 ret=-2
2019-03-05 11:01:09.530324 7f64138b1700  0 ERROR: failed to get bucket instance 
info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.530345 7f6405094700  0 data sync: ERROR: failed to retrieve 
bucket info for 
bucket=sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.531774 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.571680 7f6405094700  0 data sync: ERROR: init sync on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302 failed, 
retcode=-2
2019-03-05 11:01:09.573179 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302
2019-03-05 11:01:13.504308 7f63f903e700  1 civetweb: 0x55976f0f2000: 
10.105.18.20 - - [05/Mar/2019:11:00:57 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299&rgwx-zonegroup=de6af748-1a2f-44a1-9d44-30799cf1313e
 HTTP/1.1" 404 0 - -

From: Matthew H <[email protected]>
Date: Tuesday, March 5, 2019 at 10:03 AM
To: Christian Rice <[email protected]>, ceph-users <[email protected]>
Subject: Re: radosgw sync falling behind regularly

Hi Christian,

You have stale bucket instances that need to be clean up, which is what 
'radosgw-admin reshard stale-instances list' is showing you. Have you or were 
you manually resharding your buckets? The errors you are seeing in the logs are 
related to these stale instances being kept around.

In v12.2.11 this command along with 'radosgw-admin reshard stale-instance rm' 
was introduced [1].

Hopefully this helps.

[1]
https://ceph.com/releases/v12-2-11-luminous-released/<https://urldefense.proofpoint.com/v2/url?u=https-3A__ceph.com_releases_v12-2D2-2D11-2Dluminous-2Dreleased_&d=DwMF-g&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=NE1NbWtVhgG-K7YvLdoLZigfFx8zGPwOGk6HWpYK04I&m=vdtYIn6lEKaWD9wW297aHjQLpmQdHZrOVpOhmCBqkqo&s=nGCpS4p5jnaSpPUFlziSi3Y3pFijhVDy6e3867jA9BE&e=>

"There have been fixes to RGW dynamic and manual resharding, which no longer
leaves behind stale bucket instances to be removed manually. For finding and
cleaning up older instances from a reshard a radosgw-admin command reshard
stale-instances list and reshard stale-instances rm should do the necessary
cleanup."

________________________________
From: Christian Rice <[email protected]>
Sent: Tuesday, March 5, 2019 11:34 AM
To: Matthew H; ceph-users
Subject: Re: radosgw sync falling behind regularly


The output of “radosgw-admin reshard stale-instances list” shows 242 entries, 
which might embed too much proprietary info for me to list, but here’s a tiny 
sample:

    "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",

    "sysad_task/sysad_task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.281",

    "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",

    "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",



Some of appear repeatedly in the radosgw error logs like so:

2019-03-05 08:13:08.929206 7f6405094700  0 data sync: ERROR: init sync on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302 failed, 
retcode=-2

2019-03-05 08:13:08.930581 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302

2019-03-05 08:13:08.972053 7f6405094700  0 data sync: ERROR: init sync on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299 failed, 
retcode=-2

2019-03-05 08:13:08.973442 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299

2019-03-05 08:13:19.528295 7f6406897700  0 data sync: ERROR: init sync on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299 failed, 
retcode=-2



Notably, “Sync is disabled for bucket sysad-task.”  We use “bucket sync 
disable” A LOT.  It’s the only way we’ve been able to use multisite and a 
single namespace and not replicate things that are unneeded to every zone.  
Perhaps there’s a bug in the implementation that’s tripping us up now, with the 
new sync multisite sync code from 12.2.9 onward?



What might we do with stale bucket instances, then?



Of note, our master zone endpoint, which was timing out health checks most of 
the day after the upgrade (was running but overworked by cluster confusion, so 
we couldn’t create new buckets or do user ops), has returned to availability 
late last night.  There’s a lot of data to look at, but in my estimation, due 
to lack of user complaints (or their unawareness of specific issues), it seems 
the zones are nominally available, even with all the errors and warnings being 
logged.  We’ve tested simple zone replication by creating a few files in one 
zone and seeing them show up elsewhere…



here’s “period get” output:



sv5-ceph-rgw1

{

    "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",

    "epoch": 16,

    "predecessor_uuid": "926c74c7-c1a7-46b1-9f25-eb5c392a7fbb",

    "sync_status": [],

    "period_map": {

        "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",

        "zonegroups": [

            {

                "id": "de6af748-1a2f-44a1-9d44-30799cf1313e",

                "name": "us",

                "api_name": "us",

                "is_master": "true",

                "endpoints": [

                   "http://sv5-ceph-rgw1.savagebeast.com:8080";

                ],

                "hostnames": [],

                "hostnames_s3website": [],

                "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                "zones": [

                    {

                        "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",

                        "name": "dc11-prod",

                        "endpoints": [

                            "http://dc11-ceph-rgw1:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    },

                    {

                        "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                        "name": "sv5-corp",

                        "endpoints": [

                            "http://sv5-ceph-rgw1.savagebeast.com:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    },

                    {

                        "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",

                        "name": "sv3-prod",

                        "endpoints": [

                            "http://sv3-ceph-rgw1:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    }

                ],

                "placement_targets": [

                    {

                        "name": "default-placement",

                        "tags": []

                    }

               ],

                "default_placement": "default-placement",

                "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"

            }

        ],

        "short_zone_ids": [

            {

                "key": "107d29a0-b732-4bf1-a26e-1f64f820e839",

                "val": 1720993486

            },

            {

                "key": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                "val": 2301637458

            },

            {

                "key": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",

                "val": 1449486239

            }

        ]

    },

    "master_zonegroup": "de6af748-1a2f-44a1-9d44-30799cf1313e",

    "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

    "period_config": {

        "bucket_quota": {

            "enabled": false,

            "check_on_raw": false,

            "max_size": -1,

            "max_size_kb": 0,

            "max_objects": -1

        },

        "user_quota": {

            "enabled": false,

            "check_on_raw": false,

            "max_size": -1,

            "max_size_kb": 0,

            "max_objects": -1

        }

    },

    "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd",

    "realm_name": "savagebucket",

    "realm_epoch": 2

}



sv3-ceph-rgw1

{

    "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",

    "epoch": 16,

    "predecessor_uuid": "926c74c7-c1a7-46b1-9f25-eb5c392a7fbb",

    "sync_status": [],

    "period_map": {

        "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",

        "zonegroups": [

            {

                "id": "de6af748-1a2f-44a1-9d44-30799cf1313e",

                "name": "us",

                "api_name": "us",

                "is_master": "true",

                "endpoints": [

                    "http://sv5-ceph-rgw1.savagebeast.com:8080";

                ],

                "hostnames": [],

                "hostnames_s3website": [],

                "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                "zones": [

                    {

                        "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",

                        "name": "dc11-prod",

                        "endpoints": [

                            "http://dc11-ceph-rgw1:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    },

                    {

                        "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                        "name": "sv5-corp",

                        "endpoints": [

                            "http://sv5-ceph-rgw1.savagebeast.com:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    },

                    {

                        "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",

                        "name": "sv3-prod",

                        "endpoints": [

                            "http://sv3-ceph-rgw1:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    }

                ],

                "placement_targets": [

                    {

                        "name": "default-placement",

                        "tags": []

                    }

                ],

                "default_placement": "default-placement",

                "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"

            }

        ],

        "short_zone_ids": [

            {

                "key": "107d29a0-b732-4bf1-a26e-1f64f820e839",

                "val": 1720993486

            },

            {

                "key": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                "val": 2301637458

            },

            {

                "key": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",

                "val": 1449486239

            }

        ]

    },

    "master_zonegroup": "de6af748-1a2f-44a1-9d44-30799cf1313e",

    "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

    "period_config": {

        "bucket_quota": {

            "enabled": false,

            "check_on_raw": false,

            "max_size": -1,

            "max_size_kb": 0,

            "max_objects": -1

        },

        "user_quota": {

            "enabled": false,

            "check_on_raw": false,

            "max_size": -1,

            "max_size_kb": 0,

            "max_objects": -1

        }

    },

    "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd",

    "realm_name": "savagebucket",

    "realm_epoch": 2

}



dc11-ceph-rgw1

{

    "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",

    "epoch": 16,

    "predecessor_uuid": "926c74c7-c1a7-46b1-9f25-eb5c392a7fbb",

    "sync_status": [],

    "period_map": {

        "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",

        "zonegroups": [

            {

                "id": "de6af748-1a2f-44a1-9d44-30799cf1313e",

                "name": "us",

                "api_name": "us",

                "is_master": "true",

                "endpoints": [

                    "http://sv5-ceph-rgw1.savagebeast.com:8080";

                ],

                "hostnames": [],

                "hostnames_s3website": [],

                "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                "zones": [

                    {

                        "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",

                        "name": "dc11-prod",

                        "endpoints": [

                            "http://dc11-ceph-rgw1:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    },

                    {

                        "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                        "name": "sv5-corp",

                        "endpoints": [

                            "http://sv5-ceph-rgw1.savagebeast.com:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    },

                    {

                        "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",

                        "name": "sv3-prod",

                        "endpoints": [

                            "http://sv3-ceph-rgw1:8080";

                        ],

                        "log_meta": "false",

                        "log_data": "true",

                        "bucket_index_max_shards": 0,

                        "read_only": "false",

                        "tier_type": "",

                        "sync_from_all": "true",

                        "sync_from": []

                    }

                ],

                "placement_targets": [

                    {

                        "name": "default-placement",

                        "tags": []

                    }

                ],

                "default_placement": "default-placement",

                "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"

            }

        ],

        "short_zone_ids": [

            {

                "key": "107d29a0-b732-4bf1-a26e-1f64f820e839",

                "val": 1720993486

            },

            {

                "key": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

                "val": 2301637458

            },

            {

                "key": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",

                "val": 1449486239

            }

        ]

    },

    "master_zonegroup": "de6af748-1a2f-44a1-9d44-30799cf1313e",

    "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

    "period_config": {

        "bucket_quota": {

            "enabled": false,

            "check_on_raw": false,

            "max_size": -1,

            "max_size_kb": 0,

            "max_objects": -1

        },

        "user_quota": {

            "enabled": false,

            "check_on_raw": false,

            "max_size": -1,

            "max_size_kb": 0,

            "max_objects": -1

        }

    },

    "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd",

    "realm_name": "savagebucket",

    "realm_epoch": 2

}



From: Matthew H <[email protected]>
Date: Tuesday, March 5, 2019 at 4:31 AM
To: Christian Rice <[email protected]>, ceph-users <[email protected]>
Subject: Re: radosgw sync falling behind regularly



Hi Christian,



You haven't resharded any of your buckets have you?  You can run the command 
below in v12.2.11 to list stale bucket instances.



radosgw-admin reshard stale-instances list



Can you also send the output from the following command on each rgw?



radosgw-admin period get

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw sync falling behind regularly

Reply via email to