All disks are 8TB HDD. We have some 16TB HDDs but those are all in the newest host, that updated just fine.
Am Di., 27. Jan. 2026 um 12:39 Uhr schrieb Malte Stroem < [email protected]>: > From what I can see, the OSD is running. It's starting up and still > needs time. > > How big is the disk? > > On 1/27/26 12:31, Boris via ceph-users wrote: > > Sure: https://pastebin.com/9RLzyUQs > > > > I've trimmed the log a little bit (removed peering, epoch, trim and so > on). > > This is the last OSD that we tried that did not work. > > > > We tried another host, where the upgrade just went through. But this Host > > also got the newest hardware. > > But we don't think it is a hardware issue, because the first 30 OSDs were > > on the two oldest hosts and the first one that failed was on the same > host > > as the last OSD that did not fail. > > > > > > > > Am Di., 27. Jan. 2026 um 12:05 Uhr schrieb Malte Stroem < > > [email protected]>: > > > >> Could be the kind of hardware you are using. Is it different from the > >> other clusters' hardware? > >> > >> Send us logs, so we can help you out. > >> > >> Example: > >> > >> journalctl -eu [email protected] > >> > >> Best, > >> Malte > >> > >> On 1/27/26 11:55, Boris via ceph-users wrote: > >>> Hi, > >>> we are currently facing an issue, that suddenly none of the OSDs will > >> start > >>> after the container started with the new versions. > >>> > >>> This seems to be an issue with some hosts/OSDs. The first 30 OSDs > worked, > >>> but took really long (like 5 hours) and then every single OSD after > that > >>> needed a host reboot to bring the disk back up and continue the update. > >>> > >>> We've stopped after 6 tries. > >>> > >>> And one disk never came back up. We removed and zapped the OSD. The > >>> orchestrator picked the available disk and recreated it. It came up > >> within > >>> seconds. > >>> > >>> We have around 90 clusters and this happened only on a single one. All > >>> others updates within two hours without any issues. > >>> > >>> The cluster uses HDDs (8TB) with the block.db on SSD (5 block.db per > >> SSD). > >>> The file /var/log/ceph/UUID/ceph-volume.log get hammered with a lot of > >>> output from udevadm, lsblk and nsenter > >>> The activation container (ceph-UUID-osd-N-activate) gets killed after a > >>> couple of minutes. > >>> It also looks like the block and block.db links > >>> in /var/lib/ceph/UUID/osd.N/ are not correctly set. > >>> When we restart the daemons that needed a host restart, the OSD doesn't > >>> come up and needs a host restart. > >>> > >>> All OSDs are encrypted. > >>> > >>> Does anyone got some ideas how to debug further? > >>> _______________________________________________ > >>> ceph-users mailing list -- [email protected] > >>> To unsubscribe send an email to [email protected] > >> > >> > > > > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
