Hi Ceph Users, I am planning a major upgrade for our production cluster from Reef (18.2.7) to Squid (19.2.3) and would like to seek advice regarding stability and potential risks.
Infrastructure Overview: + Deployment: Cephadm. + Cluster Size: 7 Nodes total. + Hardware/Virtualization: Each node is a Virtual Machine hosted on Proxmox. + OSD Layout: 7 OSDs total (7 OSD per node). + Other Daemons: 5 MONs, 7 MDSs, 3 MGRs. Services include RGW, CephFS, and Block Devices. + Pool Type: Replicated. The Concern: We are currently running stable on Reef 18.2.7. However, I have been following recent discussions on the mailing list and tracker regarding critical failures when upgrading to Squid, specifically concerning OSD crashes and data corruption. I am particularly worried about the issues reported in these threads, where users experienced failures during or after the upgrade: + https://www.mail-archive.com/[email protected]/msg30399.html + https://www.mail-archive.com/[email protected]/msg31238.html + https://tracker.ceph.com/issues/70390 Given our deployment topology and the jump to a new major version (Squid), I have a few questions: + Stability & Success Stories: Is Ceph Squid 19.2.3 considered safe regarding the OSD crash/corruption bugs mentioned in the links above? Has anyone in the community successfully completed the upgrade from 18.2.7 to 19.2.3 without issues? Confirmation of a clean upgrade path would be very reassuring. + Upgrade Path: Are there any known regressions or critical "gotchas" when moving from 18.2.7 directly to 19.2.3 in a virtualized environment? Any experiences or warnings from those running Squid in similar environments would be greatly appreciated. Thank you, Van Tran _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
