Hi,
I have problem with RGW in multisite configuration with Nautilus 14.2.11. Both
zones with SSD and 10Gbps network. Master zone consist from 5x DELL R740XD
servers (every 256GB RAM, 8x800GB SSD for CEPH, 24xCPU). Secondary zone
(temporary for testing) consist from 3x HPE DL360 Gen10 servers (every 256GB
RAM, 6x800GB SSD, 48CPU).
We have 17 test buckets with manual sharding (101 shards). Every bucket with
10M of small objects (10kB - 15kB). Zonegroup configuration is attached bellow.
Replication of 150M objects from master to secondary zone takes almost 28 hours
and the replication completed with success.
After deleting objects from one bucket in master zone is not possible to sync
zones properly. I tried to restart both secondary RGWs, but without success.
Sync status on secondary zone is behind master. The number of objects in
buckets on master zone is different than on secondary zone.
Ceph HEALTH status is WARNING on both zones. On master zone I have 146 large
objects found in pool 'prg2a-1.rgw.buckets.index' 16 large objects found in
pool 'prg2a-1.rgw.log'. On secondary zone 88 large objects found in pool
'prg2a-2.rgw.log' 1584 large objects found in pool 'prg2a-2.rgw.buckets.index'.
AVG OSD latencies on secondary zone during sync was "read 0,158ms, write
1,897ms, overwrite 1,634ms". After unsuccesfull sync (after 12h of sync fall
down RGW requests, IOPS and throughput) jumps up AVG OSD latencies to "read
125ms, write 30ms, overwrite 272ms". After stopping of both RGWs on secondary
zone are AVG OSD latencies almost 0ms, but when I start RGWs on secondary zone
again, OSD latencies will rise again to "read 125ms, write 30ms, overwrite
272ms" with spikes up to 3 seconds.
We have seen the same behaviour of ceph multisite with large number of object
in one bucket (150M+ objects), so we tried different strategy with smaller
buckets, but results are same.
I will appreciate any help or advice, how tune or diagnose multisite problems.
Does anyone else have any ideas? Is there anyone else with a similar use-case?
I do not know what is wrong.
Thank you and best regards,
Miroslav
radosgw-admin zonegroup get
{
"id": "ac0005da-2e9f-4f38-835f-72b289c240d0",
"name": "prg2a",
"api_name": "prg2a",
"is_master": "true",
"endpoints": [
"http://s3.prg1a.sys.cz:80",
"http://s3.prg2a.sys.cz:80"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "d9ebbd1f-3312-4083-b4c2-843e1fb899ad",
"zones": [
{
"id": "d9ebbd1f-3312-4083-b4c2-843e1fb899ad",
"name": "prg2a-1",
"endpoints": [
"http://10.104.200.101:7480",
"http://10.104.200.102:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
},
{
"id": "fdd76c02-c679-4ec7-8e7d-c14d2ac74fb4",
"name": "prg2a-2",
"endpoints": [
"http://10.104.200.221:7480",
"http://10.104.200.222:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "cb831094-e219-44b8-89f3-fe25fc288c00"
ii radosgw 14.2.11-pve1 amd64
REST gateway for RADOS distributed object store
ii ceph 14.2.11-pve1 amd64
distributed storage and file system
ii ceph-base 14.2.11-pve1 amd64
common ceph daemon libraries and management tools
ii ceph-common 14.2.11-pve1 amd64
common utilities to mount and interact with a ceph storage cluster
ii ceph-fuse 14.2.11-pve1 amd64
FUSE-based client for the Ceph distributed file system
ii ceph-mds 14.2.11-pve1 amd64
metadata server for the ceph distributed file system
ii ceph-mgr 14.2.11-pve1 amd64
manager for the ceph distributed storage system
ii ceph-mon 14.2.11-pve1 amd64
monitor server for the ceph storage system
ii ceph-osd 14.2.11-pve1 amd64
OSD server for the ceph storage system
ii libcephfs2 14.2.11-pve1 amd64
Ceph distributed file system client library
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]