[ceph-users] Re: Removing an OSD node the right way

Dan van der Ster Fri, 03 Dec 2021 04:14:35 -0800

Hi,

This is indeed the expected behaviour.


The in/out are used as a 2nd factor weight in the OSD placement algorithm.
So crush weight 1, weight 0 is not equivalent to crush weight 0.

The correct way to decommission OSDs / Hosts is to decrease the crush weight.

Cheers, Dan



On Fri, Dec 3, 2021 at 1:08 PM [email protected]
<[email protected]> wrote:
>
> Dear Cephers,
>
> I had to remove a failed OSD server node, and what i did is the following
> 1) First marked all OSDs on that (to be removed) server down and out
> 2) Secondly, let Ceph do backfilling and rebalancing, and wait for completing
> 3) Now i have full redundancy, so i delete thoses removed OSDs from the 
> cluster, e.g. ceph osd cursh remove osd.${OSD_NUM}
> 4) To my surprise, after removing those already-out OSDs from the cluster, i 
> was seeing a tons of PG remapped and once again BACKFILLING/REBALANCING
>
> What is major problems of the above procedure, which caused double 
> BACKFILLING/REBALANCING?  The root cause could be on those "already-out" OSDs 
> but "not-yet being-removed" form CRUSH"? I previous thought those "out" OSDs 
> would not impact CRUSH, but it seems i am wrong.
>
> Any suggestions, comments, explanations are highly appreciated,
>
> Best regards,
>
> Samuel
>
>
>
> [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Removing an OSD node the right way

Reply via email to