Re: [ceph-users] pgs backfill_toofull after removing OSD from CRUSH map

Eugen Block Thu, 19 Dec 2019 01:22:33 -0800

Hi Kristof,

setting the OSD "out" doesn't change the crush weight of that OSD, butremoving it from the tree does, that's why the cluster started torebalance.


Regards,
Eugen


Zitat von Kristof Coucke <kristof.cou...@gmail.com>:

Hi all,

We are facing a strange symptom here.
We're testing our recovery procedures. Short description of our environment:
1. 10 OSD host nodes, each 13 disks + 2 NVMe's
2. 3 monitor nodes
3. 1 management node
4. 2 RGW's
5. 1 Client

Ceph version: Nautilus version 14.2.4

=> We are testing to "nicely" eliminate 1 OSD host.
As a first step, we've removed the OSD's by running "ceph osd out
osd.<id>".
System went in error with a few messages that backfill was too full, but
this was more or less expected.

However, after leaving the system recovering, everything went back to
normal. Health did not indicate any warnings nor errors.
Running the Ceph OSD safe to destroy command indicated disks could be
safely removed.

So far so good, no problem...
Then we decided to properly removed the disks from the crush map, and now
the whole story starts again. Backfill_toofull errors and recovery is
running again.

Why?
The disks were already marked out and no PG's have been on them.

Is this caused by the fact that the CRUSH map is modified and recalculation
is happening causing the PG's automatically to be linked to different
OSD's? It does seem a strange behaviour to be honest.

Any feedback is greatly appreciated!

Regards,

Kristof Coucke




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pgs backfill_toofull after removing OSD from CRUSH map

Reply via email to