Hi,

we just set up 2 new ceph clusters (using rook). To do some processing of the 
user activity we configured a topic that sends events to Kafka.

After 5-12 hours this stops working with a 503 SlowDown response:
debug 2024-08-02T09:17:58.205+0000 7ff4359ad700 1 req 13681579273117692719 
0.005000019s ERROR: failed to reserve notification on queue: private.rgw. 
error: -28

First thought would be that the queue is full but up to this point see messages 
coming into Kafka and without much activity on the RGW itself (only a few 
requests against the S3 API) so it can’t be a load issue.

What helps is to remove the notification configuration on the buckets 
(put-bucket-notification-configuration). If we directly re-add the previous 
notification configuration it also continuous working for a few hours before 
failing again with the same error/behaviour.

We haven’t been able to reproduce this if we disable persistence for the topic 
so it looks like it is related to the persistence option - otherwise there 
would be also no queuing of the event for sending to Kafka.
This also suggests that the issue is not with Kafka - this is also what we 
suspected first e.g. it can’t handle the amount of messages etc.

Does anyone else have or had this issue and found the cause or a suggestion on 
how to best continue debugging? Are there detailed metrics etc. on the size and 
usage of the event queue?


Here is the configuration for the topic and for a bucket:

$ radosgw-admin topic list
{
   "topics": [
       {
           "user": "",
           "name": "private.rgw",
           "dest": {
               "push_endpoint": 
"kafka://rgw-sasl-kafka-user:[email protected]:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512",
               "push_endpoint_args": 
"OpaqueData=&Version=2010-03-31&kafka-ack-level=broker&persistent=false&push-endpoint=kafka://rgw-sasl-kafka-user:[email protected]:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512&use-ssl=true&verify-ssl=true",
               "push_endpoint_topic": "private.rgw",
               "stored_secret": true,
               "persistent": true
           },
           "arn": "arn:aws:sns:ceph-objectstore::private.rgw",
           "opaqueData": ""
       }
   ]
}

$ aws s3api get-bucket-notification-configuration --bucket=XXX
{
   "TopicConfigurations": [
       {
           "Id": “my-id",
           "TopicArn": "arn:aws:sns:ceph-objectstore::private.rgw",
           "Events": [
               "s3:ObjectCreated:*",
               "s3:ObjectRemoved:*"
           ]
       }
   ]
}


Thank you for any input to solve this!


Cheers,
Florian
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to