Hi Casey,
I set up a completely fresh cluster on a new VM host.. everything is fresh
fresh fresh. I feel like it installed cleanly and because there is practically
zero latency and unlimited bandwidth as peer VMs, this is a better place to
experiment. The behavior is the same as the other cluster.
The realm is “example-test”, has a single zone group named “us”, and there are
zones “left” and “right”. The master zone is “left” and I am trying to
unidirectionally replicate to “right”. “left” is a two node cluster and right
is a single node cluster. Both show "too few PGs per OSD” but are otherwise
100% active+clean. Both clusters have been completely restarted to make sure
there are no latent config issues, although only the RGW nodes should require
that.
The thread at [1] is the most involved engagement I’ve found with a staff
member on the subject, so I checked and believe I attached all the logs that
were requested there. They all appear to be consistent and are attached below.
For start:
> [root@right01 ~]# radosgw-admin sync status
> realm d5078dd2-6a6e-49f8-941e-55c02ad58af7 (example-test)
> zonegroup de533461-2593-45d2-8975-99072d860bb2 (us)
> zone 5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe (right)
> metadata sync syncing
> full sync: 0/64 shards
> incremental sync: 64/64 shards
> metadata is caught up with master
> data sync source: 479d3f20-d57d-4b37-995b-510ba10756bf (left)
> syncing
> full sync: 0/128 shards
> incremental sync: 128/128 shards
> data is caught up with source
I tried the information at [2] and do not see any ops in progress, just
“linger_ops”. I don’t know what those are, but probably explain the slow stream
of requests back and forth between the two RGW endpoints:
> [root@right01 ~]# ceph daemon client.rgw.right01.54395.94074682941968
> objecter_requests
> {
> "ops": [],
> "linger_ops": [
> {
> "linger_id": 2,
> "pg": "2.16dafda0",
> "osd": 0,
> "object_id": "notify.1",
> "object_locator": "@2",
> "target_object_id": "notify.1",
> "target_object_locator": "@2",
> "paused": 0,
> "used_replica": 0,
> "precalc_pgid": 0,
> "snapid": "head",
> "registered": "1"
> },
> ...
> ],
> "pool_ops": [],
> "pool_stat_ops": [],
> "statfs_ops": [],
> "command_ops": []
> }
>
The next thing I tried is `radosgw-admin data sync run --source-zone=left` from
the right side. I get bursts of messages of the following form:
> 2019-04-19 21:46:34.281 7f1c006ad580 0 RGW-SYNC:data:sync:shard[1]: ERROR:
> failed to read remote data log info: ret=-2
> 2019-04-19 21:46:34.281 7f1c006ad580 0 meta sync: ERROR: RGWBackoffControlCR
> called coroutine returned -2
When I sorted and filtered the messages, each burst has one RGW-SYNC message
for each of the PGs on the left side identified by the number in “[]”. Since
left has 128 PGs, these are the numbers between 0-127. The bursts happen about
once every five seconds.
The packet traces between the nodes during the `data sync run` are mostly
requests and responses of the following form:
> HTTP GET:
> http://right01.example.com:7480/admin/log/?type=data&id=7&marker&extra-info=true&rgwx-zonegroup=de533461-2593-45d2-8975-99072d860bb2
>
> <http://right01.example.com:7480/admin/log/?type=data&id=7&marker&extra-info=true&rgwx-zonegroup=de533461-2593-45d2-8975-99072d860bb2>HTTP
> 404 RESPONSE:
> {"Code":"NoSuchKey","RequestId":"tx000000000000000002a01-005cba9593-371d-right","HostId":"371d-right-us”}
When I stop the `data sync run`, these 404s stop, so clearly the `data sync
run` isn’t changing a state in the rgw, but doing something synchronously. In
the past, I have done a `data sync init` but it doesn’t seem like doing it
repeatedly will make a difference so I didn’t do it any more.
NEXT STEPS:
I am working on how to get better logging output from daemons and hope to find
something in there that will help. If I am lucky, I will find something in
there and can report back so this thread is useful for others. If I have not
written back, I probably haven’t found anything, so would be grateful for any
leads.
Kind regards and thank you!
Brian
[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013188.html
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013188.html>
[2]
http://docs.ceph.com/docs/master/radosgw/troubleshooting/?highlight=linger_ops#blocked-radosgw-requests
<http://docs.ceph.com/docs/master/radosgw/troubleshooting/?highlight=linger_ops#blocked-radosgw-requests>
CONFIG DUMPS:
> [root@left01 ~]# radosgw-admin period get-current
> {
> "current_period": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c"
> }
> [root@left01 ~]# radosgw-admin period get cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c
> {
> "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
> "epoch": 6,
> "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80",
> "sync_status": [],
> "period_map": {
> "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
> "zonegroups": [
> {
> "id": "de533461-2593-45d2-8975-99072d860bb2",
> "name": "us",
> "api_name": "us",
> "is_master": "true",
> "endpoints": [
> "http://left01.example.com:7480
> <http://left01.example.com:7480/>"
> ],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "zones": [
> {
> "id": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "name": "left",
> "endpoints": [
> "http://left01.example.com:7480
> <http://left01.example.com:7480/>"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> },
> {
> "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
> "name": "right",
> "endpoints": [
> "http://right01.example.com:7480
> <http://right01.example.com:7480/>"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7"
> }
> ],
> "short_zone_ids": [
> {
> "key": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "val": 1817029288
> },
> {
> "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
> "val": 1573215025
> }
> ]
> },
> "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2",
> "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "period_config": {
> "bucket_quota": {
> "enabled": false,
> "check_on_raw": false,
> "max_size": -1,
> "max_size_kb": 0,
> "max_objects": -1
> },
> "user_quota": {
> "enabled": false,
> "check_on_raw": false,
> "max_size": -1,
> "max_size_kb": 0,
> "max_objects": -1
> }
> },
> "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7",
> "realm_name": “example-test",
> "realm_epoch": 2
> }
> [root@left01 ~]# radosgw-admin zonegroup get
> {
> "id": "de533461-2593-45d2-8975-99072d860bb2",
> "name": "us",
> "api_name": "us",
> "is_master": "true",
> "endpoints": [
> "http://left01.example.com:7480 <http://left01.example.com:7480/>"
> ],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "zones": [
> {
> "id": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "name": "left",
> "endpoints": [
> "http://left01.example.com:7480
> <http://left01.example.com:7480/>"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> },
> {
> "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
> "name": "right",
> "endpoints": [
> "http://right01.example.com:7480
> <http://right01.example.com:7480/>"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7"
> }
> [root@left01 ~]# radosgw-admin period get
> {
> "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
> "epoch": 6,
> "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80",
> "sync_status": [],
> "period_map": {
> "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
> "zonegroups": [
> {
> "id": "de533461-2593-45d2-8975-99072d860bb2",
> "name": "us",
> "api_name": "us",
> "is_master": "true",
> "endpoints": [
> "http://left01.example.com:7480
> <http://left01.example.com:7480/>"
> ],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "zones": [
> {
> "id": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "name": "left",
> "endpoints": [
> "http://left01.example.com:7480
> <http://left01.example.com:7480/>"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> },
> {
> "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
> "name": "right",
> "endpoints": [
> "http://right01.example.com:7480
> <http://right01.example.com:7480/>"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7"
> }
> ],
> "short_zone_ids": [
> {
> "key": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "val": 1817029288
> },
> {
> "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
> "val": 1573215025
> }
> ]
> },
> "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2",
> "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
> "period_config": {
> "bucket_quota": {
> "enabled": false,
> "check_on_raw": false,
> "max_size": -1,
> "max_size_kb": 0,
> "max_objects": -1
> },
> "user_quota": {
> "enabled": false,
> "check_on_raw": false,
> "max_size": -1,
> "max_size_kb": 0,
> "max_objects": -1
> }
> },
> "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7",
> "realm_name": “example-test",
> "realm_epoch": 2
> }
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com