[ceph-users] Re: Why does recovering objects take much longer than the outage that caused them?

Boris Fri, 19 Sep 2025 04:58:29 -0700

If you have misplaced objects, the OSDs were marked out and ceph started to 
move the PGs to other nodes. This usually happens after 5 minutes of OSD 
downtime


If you do "ceph osd set noout" and restart a node it should backfill much 
faster, because it just needs to catch up with the changes. 


Mit freundlichen Grüßen
 - Boris Behrens

> Am 19.09.2025 um 13:25 schrieb Niklas Hambüchen <m...@nh2.me>:
> 
> I noticed that for my clusters, even a short 5-minute network outage or 
> single-host reboot can cause
> 
>    pgs:     5586988/366684639 objects misplaced (1.524%)
> 
> which at the speed of
> 
>    recovery: 2.2 GiB/s, 676 objects/s
> 
> can take hours to recover.
> 
> I don't understand how this can be. If it's down for so short, how can 
> rebalancing can take this long?
> 
> I'm using Ceph 19.2.2 on HDDs with SSDs as BlueStore "db" device.
> Is this perhaps that writes of new files are written linearly to HDD (fast) 
> but recovery seeks around on my HDDs in random order (slow)?
> 
> In any case, this asymmetry is quite annoying.
> Could anything be done against it?
> 
> Thanks!
> Niklas
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Why does recovering objects take much longer than the outage that caused them?

Reply via email to