I finally got around to open my first issue on this, cf.
https://tracker.ceph.com/issues/73107

On Sun, Sep 14, 2025 at 2:22 PM Mikael Öhman <micket...@gmail.com> wrote:

> Hi all
>
> Yes, there was also a similar O(n^2) bug caused by indentation (in
> lsblk_all if i recall correctly). That time it took well over
> 15 minutes for me to run through it, so it was even worse.
> This time, it's not quite such a subtle bug.
>
> I plan to write up a bug report on this with the details, I just got my
> bug tracker account approved.
>
> I suspect only users with large jbods + multipath would see this issue as
> bad as I do.
> If I didn't have multipath devices that caused re-triggering the expensive
> "disk.get_devices()" repeatedly, it would "only" have taken me an extra
> ~30 seconds to launch an OSD daemon. Not good, but it would be well within
> the systemd timeout and wouldn't break the daemon completely. It's slow
> because "ceph-volume activate" also attempts to find raw devices before
> proceeding to lvm.
>
> The change was introduced in https://github.com/ceph/ceph/pull/60395  and
> I can confirm from
> https://github.com/ceph/ceph/blob/v19.2.2/src/ceph-volume/ceph_volume/devices/raw/list.py
> it does not have the specific problematic code. In 19.2.2 it's still
> technically O(n^2) but since it uses a local info_devices variable that's
> just generated once, it won't have the multipath issue that makes it 10x
> worse.
>
> https://github.com/ceph/ceph/blob/v19.2.2/src/ceph-volume/ceph_volume/devices/raw/list.py
>
> I worked around this problem by setting up a little container mirror
> locally where i monkeypatched out the raw part from ceph-volume activate.
> Dockerfile:
> FROM quay.io/ceph/ceph:v19.2.3
> RUN sed -i '46,52d'
> /usr/lib/python3.9/site-packages/ceph_volume/activate/main.py
>
> which just deletes the "first try raw" section from ceph-volume activate:
>
> https://github.com/ceph/ceph/blob/50d6a3d454763cea76ca45a846cde9702364c773/src/ceph-volume/ceph_volume/activate/main.py#L46-L52
> since like all recommended setups these days I use LVMs for all devices (I
> don't understand why one must try raw first)
> ceph-volume raw list still takes 5 minutes (and it correctly outputs 0
> devices as i don't use raw) but i don't care about that since i will only
> use "ceph-volume lvm list". At least this way activation is fast.
>
> On Sun, Sep 14, 2025 at 11:46 AM Michel Jouvin <
> michel.jou...@ijclab.in2p3.fr> wrote:
>
>> Hi Mikael,
>>
>> Thanks for the report. I was also considering upgrading from 19.2.2 to
>> 19.23. Should be related to a change between those 2 versions as I
>> experienced no problem during the 18.2.7 to 19.2.2. upgrade... It
>> reminds me a problem in one of the Quincy update if I'm right with
>> something similar (but probably a different cause) where the device
>> activation was doing just too many times the same command (at that time
>> was a trivial indentation issue in the code)... but at least it seems
>> that activation of many OSDs per node was unsufficiently tested. I don't
>> know if testing was improved...
>>
>> Best regards,
>>
>> Michel
>>
>> Le 14/09/2025 à 10:23, Eugen Block a écrit :
>> > This is interesting, I was planning to upgrade our own cluster next
>> > week from 18.2.7 to 19.2.3 as well, now I'm hesitating. Although we
>> > don't have that many OSDs per node, so we probably will not have this
>> > issue you're describing. But I can confirm that 'cephadm ceph-volume
>> > raw list' on my virtual test environment with only 3 OSDs per node
>> > takes around 11 seconds (and empty output). On Reef the output is not
>> > empty (probably because exclude_lvm_osd_devices is not present there
>> > as I understand it) and it only takes 4 seconds to complete with
>> > around 10 OSDs per node.
>> > I'll have to check with my colleagues if we should still move forward
>> > with the upgrade...
>> >
>> > Thanks for reporting that! Did you check if there's a tracker issue
>> > for that?
>> >
>> > Thanks,
>> > Eugen
>> >
>> > Zitat von Mikael Öhman <micket...@gmail.com>:
>> >
>> >> I'm fighting with a ceph upgrade, going 18.2.7 to 19.2.3.
>> >>
>> >> This time again the ceph-volume activate step is taking to long,
>> >> triggering
>> >> failures due to systemd service timing out so the orch daemon fails
>> >> (though
>> >> the osd does eventually come up, the daemon is still dead, and upgrade
>> >> halts).
>> >>
>> >> I can also reproduce the slowdown of startup with
>> >> cephadm ceph-volume raw list
>> >>
>> >> (I don't use raw devices, but the ceph-volume activation method
>> >> hardcodes
>> >> checking raw first
>> >>
>> https://github.com/ceph/ceph/blob/4d5ad8c1ef04f38d14402f0d89f2df2b7d254c2c/src/ceph-volume/ceph_volume/activate/main.py#L46
>> >>
>> >> )
>> >>
>> >> That's takes 6s on 18.2.7, but 4m32s minutes on 19.2.3 !
>> >> I have 42 spinning drives per host (with multipath).
>> >>
>> >> It's spending all of it's time in the new method:
>> >> self.exclude_lvm_osd_devices()
>> >> and the list of items to scan, given all the duplication from
>> >> multipath +
>> >> and mapper names, it ends up with 308 items to scan in my setup.
>> >>
>> >> With good old print debugging, i found that while the threadpool speeds
>> >> things up a bit, it simply takes to long to construct all those
>> Device()
>> >> objects.
>> >> In fact, just creating a single Device() object, since it needs to call
>> >> disk.get_devices()
>> >> at least once, since this list does not include all devices, it
>> >> filters out
>> >> things like
>> >> "/dev/mapper/mpathxx" from the list, but the code always regenerates
>> >> (the
>> >> same) device list if the path isn't found:
>> >>
>> >>        if not sys_info.devices.get(self.path):
>> >>            sys_info.devices = disk.get_devices()
>> >>
>> >> will now force it to re-generate this list >400 times (initial 32
>> >> times in
>> >> parallel, followed by about 400 more which will never match the device
>> >> name).
>> >> In the end, it's again O(n^2) computational time to list these raw
>> >> devices
>> >> with ceph-volume.
>> >> So with 32 threads in the pool, it's also now requires running heavy
>> >> load
>> >> for 5 minutes before completing this trivial task every time the deamon
>> >> needs to start.
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to