Hi,

We are running 3 clusters in multisite. All 3 were running Quincy 17.2.6 and 
using cephadm. We upgraded one of the secondary sites to Reef 18.2.1 a couple 
of weeks ago and were planning on doing the rest shortly afterwards.

We run 3 RGW daemons on separate physical hosts behind an external HAProxy HA 
pair for each cluster.

Since we upgrade to Reef we have had issues with the RGWs stopping processing 
requests. We can see that they don't crash as they still have entries in the 
logs about syncing, but as far as request processing goes, they just stop. 
While debugging this we have 1 of the 3 RGWs running a Quincy image, and this 
has never had an issue where it stops processing requests. Any Reef containers 
we deploy have always stopped within 48Hrs of being deployed. We have tried 
Reef versions 18.2.1, 18.2.2 and 18.1.3 and all exhibit the same issue. We are 
running podman 4.6.1 on Centos 8 with kernel 4.18.0-513.24.1.el8_9.x86_64.

We have enabled debug logs for the RGWs but we have been unable to find 
anything in them that would shed light on the cause.

We are just wondering if anyone had any ideas on what could be causing this or 
how to debug it further?

Thanks
Iain

Iain Stott
OpenStack Engineer
iain.st...@thg.com
[THG Ingenuity Logo]<https://www.thg.com>
www.thg.com<https://www.thg.com/>
[LinkedIn]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk> 
[Instagram] <https://www.instagram.com/thg>  [X] 
<https://twitter.com/thgplc?lang=en>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to