Marking them OUT first is the way to go.  As long as the osds stay UP, they can 
and will participate in the recovery.  How many you can mark out at one time 
will depend on how sensitive your client i/o is to background recovery, and all 
of the related tunings.  If you have the hours/days to spare, it is definitely 
easier on the cluster to do them one at a time.

Thank you,
Josh Beaman

From: Dave Hall <kdh...@binghamton.edu>
Date: Friday, August 4, 2023 at 8:45 AM
To: ceph-users <ceph-users@ceph.io>
Cc: anthony.datri <anthony.da...@gmail.com>
Subject: [EXTERNAL] [ceph-users] Natuilus: Taking out OSDs that are 'Failure 
Pending'
Hello.  It's been a while.  I have a Nautilus cluster with 72 x 12GB HDD
OSDs (BlueStore) and mostly of EC 8+2 pools/PGs.  It's been working great -
some nodes went nearly 900 days without a reboot.

As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending
Failure'.  New drives are ordered and will be here next week.  There is a
procedure in the documentation for replacing an OSD, but I can't do that
directly until I receive the drives.

My inclination is to mark these 3 OSDs 'OUT' before they crash completely,
but I want to confirm my understanding of Ceph's response to this.  Mainly,
given my EC pools (or replicated pools for that matter), if I mark all 3
OSD out all at once will I risk data loss?

If I have it right, marking an OSD out will simply cause Ceph to move all
of the PG shards from that OSD to other OSDs, so no major risk of data
loss.  However, if it would be better to do them one per day or something,
I'd rather be safe.

I also assume that I should wait for the rebalance to complete before I
initiate the replacement procedure.

Your thoughts?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to