Public
Hi All,
We are running a multisite setup running luminous on bluestore. This setup
worked perfectly since installation around the time 12.2.2 came out. A few
days ago we upgraded the clusters from 12.2.5 to 12.2.7 and now we noticed one
of the clusters does not sync anymore.
Primary site:
# radosgw-admin sync status
realm 87f16146-0729-4b37-9462-dd5e6d97b427 (pro)
zonegroup 9fad4a8d-9a7b-4649-a54a-856450635808 (be)
zone 4ed07bb2-a80b-4c69-aa15-fdc17ae6f5f2 (bccm-pro)
metadata sync no sync (zone is master)
data sync source: ad420c46-3ef3-430a-afef-bff78e26d410 (bccl-pro)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
secondary site:
# radosgw-admin sync status
realm 87f16146-0729-4b37-9462-dd5e6d97b427 (pro)
zonegroup 9fad4a8d-9a7b-4649-a54a-856450635808 (be)
zone ad420c46-3ef3-430a-afef-bff78e26d410 (bccl-pro)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 4ed07bb2-a80b-4c69-aa15-fdc17ae6f5f2 (bccm-pro)
syncing
full sync: 80/128 shards
full sync: 4 buckets to sync
incremental sync: 48/128 shards
data is behind on 80 shards
behind shards:
[0,6,25,44,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,115,116,117,118,119,120,121,122,123,124,125,126,127]
1 shards are recovering
recovering shards: [27]
When we first noticed the problem only 3 shards were behind and shard 27 was
recovering. One of the things we did was a radosgw-admin data sync init in an
attempt to have everything syncing again. Since then, 48 shards seem to have
done the full sync and are now incremental syncing, but the rest just stays
like this. It seems the recovering of shard 27 blocks the rest of the sync?
Radosgw-admin sync error list shows a number of errors from during the upgrade,
mostly "failed to sync object(5) Input/output error" and "failed to sync bucket
instance: (5) Input/output error". Does this mean radosgw was unable to write
to the pool?
# radosgw-admin data sync status --shard-id 27 --source-zone bccm-pro
{
"shard_id": 27,
"marker": {
"status": "incremental-sync",
"marker": "1_1534494893.816775_131867195.1",
"next_step_marker": "",
"total_entries": 1,
"pos": 0,
"timestamp": "0.000000"
},
"pending_buckets": [],
"recovering_buckets": [
"pro-registry:4ed07bb2-a80b-4c69-aa15-fdc17ae6f5f2.314303.1:26"
]
}
How can we recover shard 27? Any ideas on how we get this multisite setup
healthy again? I wanted to create an issue in the tracker for this, but it
seems a normal user does not have permissions to do this anymore?
Many Thanks
Dieter
This e-mail and any attached files are confidential and may be legally
privileged. If you are not the addressee, any disclosure, reproduction,
copying, distribution, or other dissemination or use of this communication is
strictly prohibited. If you have received this transmission in error please
notify KBC immediately and then delete this e-mail.
KBC does not accept liability for the correct and complete transmission of the
information, nor for any delay or interruption of the transmission, nor for
damages arising from the use of or reliance on the information.
All e-mail messages addressed to, received or sent by KBC or KBC employees are
deemed to be professional in nature. Accordingly, the sender or recipient of
these messages agrees that they may be read by other KBC employees than the
official recipient or sender in order to ensure the continuity of work-related
activities and allow supervision thereof.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com