I finally got around to open my first issue on this, cf. https://tracker.ceph.com/issues/73107
On Sun, Sep 14, 2025 at 2:22 PM Mikael Öhman <micket...@gmail.com> wrote: > Hi all > > Yes, there was also a similar O(n^2) bug caused by indentation (in > lsblk_all if i recall correctly). That time it took well over > 15 minutes for me to run through it, so it was even worse. > This time, it's not quite such a subtle bug. > > I plan to write up a bug report on this with the details, I just got my > bug tracker account approved. > > I suspect only users with large jbods + multipath would see this issue as > bad as I do. > If I didn't have multipath devices that caused re-triggering the expensive > "disk.get_devices()" repeatedly, it would "only" have taken me an extra > ~30 seconds to launch an OSD daemon. Not good, but it would be well within > the systemd timeout and wouldn't break the daemon completely. It's slow > because "ceph-volume activate" also attempts to find raw devices before > proceeding to lvm. > > The change was introduced in https://github.com/ceph/ceph/pull/60395 and > I can confirm from > https://github.com/ceph/ceph/blob/v19.2.2/src/ceph-volume/ceph_volume/devices/raw/list.py > it does not have the specific problematic code. In 19.2.2 it's still > technically O(n^2) but since it uses a local info_devices variable that's > just generated once, it won't have the multipath issue that makes it 10x > worse. > > https://github.com/ceph/ceph/blob/v19.2.2/src/ceph-volume/ceph_volume/devices/raw/list.py > > I worked around this problem by setting up a little container mirror > locally where i monkeypatched out the raw part from ceph-volume activate. > Dockerfile: > FROM quay.io/ceph/ceph:v19.2.3 > RUN sed -i '46,52d' > /usr/lib/python3.9/site-packages/ceph_volume/activate/main.py > > which just deletes the "first try raw" section from ceph-volume activate: > > https://github.com/ceph/ceph/blob/50d6a3d454763cea76ca45a846cde9702364c773/src/ceph-volume/ceph_volume/activate/main.py#L46-L52 > since like all recommended setups these days I use LVMs for all devices (I > don't understand why one must try raw first) > ceph-volume raw list still takes 5 minutes (and it correctly outputs 0 > devices as i don't use raw) but i don't care about that since i will only > use "ceph-volume lvm list". At least this way activation is fast. > > On Sun, Sep 14, 2025 at 11:46 AM Michel Jouvin < > michel.jou...@ijclab.in2p3.fr> wrote: > >> Hi Mikael, >> >> Thanks for the report. I was also considering upgrading from 19.2.2 to >> 19.23. Should be related to a change between those 2 versions as I >> experienced no problem during the 18.2.7 to 19.2.2. upgrade... It >> reminds me a problem in one of the Quincy update if I'm right with >> something similar (but probably a different cause) where the device >> activation was doing just too many times the same command (at that time >> was a trivial indentation issue in the code)... but at least it seems >> that activation of many OSDs per node was unsufficiently tested. I don't >> know if testing was improved... >> >> Best regards, >> >> Michel >> >> Le 14/09/2025 à 10:23, Eugen Block a écrit : >> > This is interesting, I was planning to upgrade our own cluster next >> > week from 18.2.7 to 19.2.3 as well, now I'm hesitating. Although we >> > don't have that many OSDs per node, so we probably will not have this >> > issue you're describing. But I can confirm that 'cephadm ceph-volume >> > raw list' on my virtual test environment with only 3 OSDs per node >> > takes around 11 seconds (and empty output). On Reef the output is not >> > empty (probably because exclude_lvm_osd_devices is not present there >> > as I understand it) and it only takes 4 seconds to complete with >> > around 10 OSDs per node. >> > I'll have to check with my colleagues if we should still move forward >> > with the upgrade... >> > >> > Thanks for reporting that! Did you check if there's a tracker issue >> > for that? >> > >> > Thanks, >> > Eugen >> > >> > Zitat von Mikael Öhman <micket...@gmail.com>: >> > >> >> I'm fighting with a ceph upgrade, going 18.2.7 to 19.2.3. >> >> >> >> This time again the ceph-volume activate step is taking to long, >> >> triggering >> >> failures due to systemd service timing out so the orch daemon fails >> >> (though >> >> the osd does eventually come up, the daemon is still dead, and upgrade >> >> halts). >> >> >> >> I can also reproduce the slowdown of startup with >> >> cephadm ceph-volume raw list >> >> >> >> (I don't use raw devices, but the ceph-volume activation method >> >> hardcodes >> >> checking raw first >> >> >> https://github.com/ceph/ceph/blob/4d5ad8c1ef04f38d14402f0d89f2df2b7d254c2c/src/ceph-volume/ceph_volume/activate/main.py#L46 >> >> >> >> ) >> >> >> >> That's takes 6s on 18.2.7, but 4m32s minutes on 19.2.3 ! >> >> I have 42 spinning drives per host (with multipath). >> >> >> >> It's spending all of it's time in the new method: >> >> self.exclude_lvm_osd_devices() >> >> and the list of items to scan, given all the duplication from >> >> multipath + >> >> and mapper names, it ends up with 308 items to scan in my setup. >> >> >> >> With good old print debugging, i found that while the threadpool speeds >> >> things up a bit, it simply takes to long to construct all those >> Device() >> >> objects. >> >> In fact, just creating a single Device() object, since it needs to call >> >> disk.get_devices() >> >> at least once, since this list does not include all devices, it >> >> filters out >> >> things like >> >> "/dev/mapper/mpathxx" from the list, but the code always regenerates >> >> (the >> >> same) device list if the path isn't found: >> >> >> >> if not sys_info.devices.get(self.path): >> >> sys_info.devices = disk.get_devices() >> >> >> >> will now force it to re-generate this list >400 times (initial 32 >> >> times in >> >> parallel, followed by about 400 more which will never match the device >> >> name). >> >> In the end, it's again O(n^2) computational time to list these raw >> >> devices >> >> with ceph-volume. >> >> So with 32 threads in the pool, it's also now requires running heavy >> >> load >> >> for 5 minutes before completing this trivial task every time the deamon >> >> needs to start. >> >> _______________________________________________ >> >> ceph-users mailing list -- ceph-users@ceph.io >> >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > >> > >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io