Hi Yuval,
Thanks for the info. So, this is a side effect of pub sub sitting on-top of the
RGW sync mechanism? I've re-included ceph-users mailing list on this email in
case anyone has ideas how to alleviate this.
Some good news on my part is that I've managed to clear 16 of the large OMAP
objects with the instructions here [1]. That is, bilog trimming and running a
deep scrub on the affected PGs.
That leaves the large OMAP objects in the "siteApubsub.rgw.log" pool that I am
still hoping to find a way to clear. These are the objects of the form
"9:03d18f4d:::data_log.47:head". From [2] I gather that these are used for
multisite syncing. Our pubsub zones are not syncing between multisite. I wonder
if that makes this simply a misconfiguration and the fix is just a correction
to config.
I've been doing some digging today and found that our pubsub zone has the
following config:
{
"id": "4f442377-4b71-4c6a-aaa9-ba945d7694f8",
"name": "siteApubsub",
"endpoints": [
https://10.225.41.200:7481,
https://10.225.41.201:7481,
https://10.225.41.202:7481
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "pubsub",
"sync_from_all": "false",
"sync_from": [
"siteA"
],
"redirect_zone": ""
}
And sync status shows...
source: 4f442377-4b71-4c6a-aaa9-ba945d7694f8 (siteApubsub)
not syncing from zone
If I set the "log_data" field to false, I think this simply stops writing these
files which are not required anyway. And presumably have been building up
gradually forever as the normal trimming is not occurring as there is no
multisite sync.
So my question to any who may be able to answer:
* Is the above analysis sound?
* Can I update the zone config and delete these data_log objects manual to
restore my cluster to HEALTH_OK?
Thanks,
Alex
[1] https://access.redhat.com/solutions/6450561
[2] https://www.spinics.net/lists/ceph-users/msg54282.html
From: Yuval Lifshitz <[email protected]>
Sent: Thursday, October 27, 2022 5:35 PM
To: Alex Hussein-Kershaw (HE/HIM) <[email protected]>
Subject: Re: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub
Hi Alex,
I checked with the RGW people working on multisite, they say they observed that
in high-load tests (unrelated to pubsub).
This means that even if this is fixed, the fix is not going to be backported to
octopus.
If they have some kind of workaround, I will let you know.
Yuval
On Thu, Oct 27, 2022 at 5:50 PM Alex Hussein-Kershaw (HE/HIM)
<[email protected]<mailto:[email protected]>> wrote:
Hi Yuval,
Thanks for your reply and consideration. It's much appreciated. We don't use
kafka (nor do I know what it is - I had a quick google) but I think the concern
is the same - if our client goes down and misses notifications from Ceph we
need Ceph to resend it until it is acknowledged. Sounds like the bucket
notification and persistent notifications fits this requirement perfectly. I'll
flag with my Team that this is available in Pacific, and that we should take it
when we move.
That said, we're still on Octopus for our main release so while that gives us a
direction for future, I'd still like to find a solution to the initial problem
as we have slow-moving customers who might stick with Octopus for several years
even after we offer a Pacific (and bucket notification) based solution.
Interestingly we've not seen this at any customers systems, only on our heavily
loaded test system. I suspect the high and regular load this system receives
must be the cause of this. I've contemplated fully stopping the load for a
month or so and observing the effect. I wonder if we're out-pacing some
clean-up mechanism (I think we've seen similar things elsewhere in our Ceph
usage).
However, we're fairly limited on virtualisation rig space and don't want to sit
this system idle if we can avoid it.
Best wishes,
Alex
From: Yuval Lifshitz <[email protected]<mailto:[email protected]>>
Sent: Thursday, October 27, 2022 10:05 AM
To: Alex Hussein-Kershaw (HE/HIM)
<[email protected]<mailto:[email protected]>>
Subject: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub
Hi Alex,
Not sure I can help you here. We recommend using the "bucket notification" [1]
mechanism over "pubsub" [2] (since it is not maintained, lacks many
functionalities, and will be deprecated).
If you are concerned with kafka outages, you can use persistent notifications
[3] (they will retry until the broker is up again) which have been available
since Ceph 16 (pacific).
It looks like an issue with the site syncing process (which drives pubsub), so
I will try to figure out if there is a simple fix here.
Yuval
[1]
https://docs.ceph.com/en/latest/radosgw/notifications/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PTwnZyi9y6jdXGkvdD2HeoMxNL%2BlughLO5qy3vtlGCA%3D&reserved=0>
[2]
https://docs.ceph.com/en/latest/radosgw/pubsub-module/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fpubsub-module%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6muaFhFiDJRH%2B3s1tF5akT2BYXRHL1ejGHdTVq2GhKE%3D&reserved=0>
[3]
https://docs.ceph.com/en/latest/radosgw/notifications/#notification-reliability<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F%23notification-reliability&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HSBo9d3x1gj7vbqmMM6jNbGD8%2Bi8NquvrdjnT3mEVY8%3D&reserved=0>
On Wed, Oct 26, 2022 at 11:57 AM Alex Hussein-Kershaw (HE/HIM)
<[email protected]<mailto:[email protected]>> wrote:
Hi Yuval,
Hope you are well. I think pubsub is your area of expertise (we've briefly
discussed it in the past).
Would love to get your advice on the below email if possible.
Kindest regards,
Alex
________________________________
From: Alex Hussein-Kershaw (HE/HIM)
Sent: Tuesday, October 25, 2022 2:48 PM
To: Ceph Users <[email protected]<mailto:[email protected]>>
Subject: Large OMAP Objects & Pubsub
Hi All,
Looking to get some advice on an issue my clusters have been suffering from.
Realize there are lots of text below. Thanks in advance for your consideration.
The cluster has a health warning of "32 large omap objects". It's persisted for
several months.
It appears functional and there are no indications of a performance problem at
the client for now (no slow ops - everything seems to work fine). It is a
multisite cluster with CephFS & S3 in use, as well as pubsub. It is running
Ceph version 15.2.13.
We run automated client load tests against this system every day and have been
doing that for a year or longer against this system. The key counts of the
large OMAP objects in question are growing, I've monitored this over a period
of several months. Intuitively I gather this means at some point in the future
I will hit performance problems as a result of this.
Large OMAP objects are split across two pools: siteApubsub.rgw.log and
siteApubsub.rgw.buckets.index. My client is responsible for processing the
pubsub queue. It appears to be doing that successfully: there are no objects in
the pubsub data pool as shown in the details below.
I've been keeping a spreadsheet to track the growth of these, assuming I can't
attach a file to the mailing list so I've uploaded an image of it here:
https://imgur.com/a/gAtAcvp<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FgAtAcvp&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1RId6I4egHiYi4s2Ixs%2FdVvep3jqvU%2FpZTjjSll9i98%3D&reserved=0>.
The data shows constant growth of all of these objects through the last couple
of months. It also includes the names of the objects, where there are two
categories:
* 16 instances of objects with names like: 9:03d18f4d:::data_log.47:head
* 16 instances of objects with names like:
13:0118e6b8:::.dir.4f442377-4b71-4c6a-aaa9-ba945d7694f8.84778.1.15:head
Please find output of a few Ceph commands below giving details of the cluster.
* I'm really keen to understand this better and would be more than happy to
share additional diags.
* I'd like to understand what I need to do to remove these large OMAP
objects and prevent future build ups, so I don't need to worry about the
stability of this system.
Thanks,
Alex
$ ceph -s
id: 0b91b8be-3e01-4240-bea5-df01c7e53b7c
health: HEALTH_WARN
32 large omap objects
services:
mon: 3 daemons, quorum albans_sc0,albans_sc1,albans_sc2 (age 6w)
mgr: albans_sc2(active, since 6w), standbys: albans_sc1, albans_sc0
mds: cephfs:1 {0=albans_sc2=up:active} 2 up:standby
osd: 3 osds: 3 up (since 6w), 3 in (since 10M)
rgw: 6 daemons active (albans_sc0.pubsub, albans_sc0.rgw0,
albans_sc1.pubsub, albans_sc1.rgw0, albans_sc2.pubsub, albans_sc2.rgw0)
task status:
data:
pools: 14 pools, 137 pgs
objects: 4.52M objects, 160 GiB
usage: 536 GiB used, 514 GiB / 1.0 TiB avail
pgs: 137 active+clean
io:
client: 28 MiB/s rd, 1.2 MiB/s wr, 673 op/s rd, 189 op/s wr
$ ceph health detail
HEALTH_WARN 32 large omap objects
[WRN] LARGE_OMAP_OBJECTS: 32 large omap objects
16 large objects found in pool 'siteApubsub.rgw.log'
16 large objects found in pool 'siteApubsub.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.
$ ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 1.0 TiB 514 GiB 496 GiB 536 GiB 51.07
TOTAL 1.0 TiB 514 GiB 496 GiB 536 GiB 51.07
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX
AVAIL
device_health_metrics 1 1 0 B 0 0 B 0 153
GiB
cephfs_data 2 32 135 GiB 1.99M 415 GiB 47.50 153
GiB
cephfs_metadata 3 32 3.3 GiB 2.09M 9.8 GiB 2.09 153
GiB
siteA.rgw.buckets.data 4 32 24 GiB 438.62k 80 GiB 14.88 153
GiB
.rgw.root 5 4 19 KiB 29 1.3 MiB 0 153
GiB
siteA.rgw.log 6 4 79 MiB 799 247 MiB 0.05 153
GiB
siteA.rgw.control 7 4 0 B 8 0 B 0 153
GiB
siteA.rgw.meta 8 4 13 KiB 37 1.6 MiB 0 153
GiB
siteApubsub.rgw.log 9 4 1.9 GiB 789 5.7 GiB 1.22 153
GiB
siteA.rgw.buckets.index 10 4 456 MiB 31 1.3 GiB 0.29 153
GiB
siteApubsub.rgw.control 11 4 0 B 8 0 B 0 153
GiB
siteApubsub.rgw.meta 12 4 11 KiB 40 1.7 MiB 0 153
GiB
siteApubsub.rgw.buckets.index 13 4 2.0 GiB 47 6.1 GiB 1.31 153
GiB
siteApubsub.rgw.buckets.data 14 4 0 B 0 0 B 0 153
GiB
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]