Hi Alex,

thank you for the script. We will monitor how the queue fills ups to see if 
this is the issue or not.


Cheers,
Florian

> On 5. Aug 2024, at 14:01, Alex Hussein-Kershaw (HE/HIM) 
> <[email protected]> wrote:
> 
> Hi Florian,
> 
> We are also gearing up to use persistent bucket notifications, but have not 
> got as far as you yet so quite interested in this. As I understand it, a 
> bunch of new function is coming in Squid on the radosgw-admin command to 
> allow gathering metrics from the queues, but they are not available yet in 
> Reef.
> 
> I've used this: parse-notifications.py (github.com) 
> <https://gist.github.com/yuvalif/b44a67b6278fe811aa38dd81a91eb3ba> to parse 
> all the objects in the queue, hopefully it helps you (credit to Yuval who 
> wrote it). The reservation failure to me does look like the queue is full. It 
> would surely be interesting to see what is in the queue. 
> 
> Best wishes,
> Alex
> 
> From: Florian Schwab <[email protected] 
> <mailto:[email protected]>>
> Sent: Monday, August 5, 2024 11:02 AM
> To: [email protected] <mailto:[email protected]> <[email protected] 
> <mailto:[email protected]>>
> Subject: [EXTERNAL] [ceph-users] RGW bucket notifications stop working after 
> a while and blocking requests
>  
> [You don't often get email from [email protected] 
> <mailto:[email protected]>. Learn why this is important at 
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hi,
> 
> we just set up 2 new ceph clusters (using rook). To do some processing of the 
> user activity we configured a topic that sends events to Kafka.
> 
> After 5-12 hours this stops working with a 503 SlowDown response:
> debug 2024-08-02T09:17:58.205+0000 7ff4359ad700 1 req 13681579273117692719 
> 0.005000019s ERROR: failed to reserve notification on queue: private.rgw. 
> error: -28
> 
> First thought would be that the queue is full but up to this point see 
> messages coming into Kafka and without much activity on the RGW itself (only 
> a few requests against the S3 API) so it can’t be a load issue.
> 
> What helps is to remove the notification configuration on the buckets 
> (put-bucket-notification-configuration). If we directly re-add the previous 
> notification configuration it also continuous working for a few hours before 
> failing again with the same error/behaviour.
> 
> We haven’t been able to reproduce this if we disable persistence for the 
> topic so it looks like it is related to the persistence option - otherwise 
> there would be also no queuing of the event for sending to Kafka.
> This also suggests that the issue is not with Kafka - this is also what we 
> suspected first e.g. it can’t handle the amount of messages etc.
> 
> Does anyone else have or had this issue and found the cause or a suggestion 
> on how to best continue debugging? Are there detailed metrics etc. on the 
> size and usage of the event queue?
> 
> 
> Here is the configuration for the topic and for a bucket:
> 
> $ radosgw-admin topic list
> {
>    "topics": [
>        {
>            "user": "",
>            "name": "private.rgw",
>            "dest": {
>                "push_endpoint": 
> "kafka://rgw-sasl-kafka-user:[email protected]:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512",
>                "push_endpoint_args": 
> "OpaqueData=&Version=2010-03-31&kafka-ack-level=broker&persistent=false&push-endpoint=kafka://rgw-sasl-kafka-user:[email protected]:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512&use-ssl=true&verify-ssl=true",
>                "push_endpoint_topic": "private.rgw",
>                "stored_secret": true,
>                "persistent": true
>            },
>            "arn": "arn:aws:sns:ceph-objectstore::private.rgw",
>            "opaqueData": ""
>        }
>    ]
> }
> 
> $ aws s3api get-bucket-notification-configuration --bucket=XXX
> {
>    "TopicConfigurations": [
>        {
>            "Id": “my-id",
>            "TopicArn": "arn:aws:sns:ceph-objectstore::private.rgw",
>            "Events": [
>                "s3:ObjectCreated:*",
>                "s3:ObjectRemoved:*"
>            ]
>        }
>    ]
> }
> 
> 
> Thank you for any input to solve this!
> 
> 
> Cheers,
> Florian
> _______________________________________________
> ceph-users mailing list -- [email protected] <mailto:[email protected]>
> To unsubscribe send an email to [email protected] 
> <mailto:[email protected]>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to