Hi,
we just set up 2 new ceph clusters (using rook). To do some processing of the
user activity we configured a topic that sends events to Kafka.
After 5-12 hours this stops working with a 503 SlowDown response:
debug 2024-08-02T09:17:58.205+0000 7ff4359ad700 1 req 13681579273117692719
0.005000019s ERROR: failed to reserve notification on queue: private.rgw.
error: -28
First thought would be that the queue is full but up to this point see messages
coming into Kafka and without much activity on the RGW itself (only a few
requests against the S3 API) so it can’t be a load issue.
What helps is to remove the notification configuration on the buckets
(put-bucket-notification-configuration). If we directly re-add the previous
notification configuration it also continuous working for a few hours before
failing again with the same error/behaviour.
We haven’t been able to reproduce this if we disable persistence for the topic
so it looks like it is related to the persistence option - otherwise there
would be also no queuing of the event for sending to Kafka.
This also suggests that the issue is not with Kafka - this is also what we
suspected first e.g. it can’t handle the amount of messages etc.
Does anyone else have or had this issue and found the cause or a suggestion on
how to best continue debugging? Are there detailed metrics etc. on the size and
usage of the event queue?
Here is the configuration for the topic and for a bucket:
$ radosgw-admin topic list
{
"topics": [
{
"user": "",
"name": "private.rgw",
"dest": {
"push_endpoint":
"kafka://rgw-sasl-kafka-user:[email protected]:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512",
"push_endpoint_args":
"OpaqueData=&Version=2010-03-31&kafka-ack-level=broker&persistent=false&push-endpoint=kafka://rgw-sasl-kafka-user:[email protected]:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512&use-ssl=true&verify-ssl=true",
"push_endpoint_topic": "private.rgw",
"stored_secret": true,
"persistent": true
},
"arn": "arn:aws:sns:ceph-objectstore::private.rgw",
"opaqueData": ""
}
]
}
$ aws s3api get-bucket-notification-configuration --bucket=XXX
{
"TopicConfigurations": [
{
"Id": “my-id",
"TopicArn": "arn:aws:sns:ceph-objectstore::private.rgw",
"Events": [
"s3:ObjectCreated:*",
"s3:ObjectRemoved:*"
]
}
]
}
Thank you for any input to solve this!
Cheers,
Florian
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]