[ceph-users] 14.2.1 OSDs crash and sometimes fail to start back up, workaround

Edward Kalk Fri, 12 Jul 2019 09:33:00 -0700

It seems that I have been able to workaround my issues.
I’ve attempted to reproduce by rebooting nodes and using the stop all OSDs wait 
a bit and start them.
At this time, no OSDs are crashing like before. OSDs seem to have no problems 
starting either.
What I did is remove completely the OSDs one at a time and reissue them 
allowing CEPH 14.2.1 to reengineer them.

Remove a disk:
1.) see which OSD is which disk: sudo ceph-volume lvm list


2.) ceph osd out X
EX:
synergy@synergy3:~$ ceph osd out 21
marked out osd.21.

2.a) ceph osd down osd.X
Ex:
ceph osd down osd.21

2.aa) Stop OSD daemon: sudo systemctl stop ceph-osd@X
EX:
sudo systemctl stop ceph-osd@21

2.b) ceph osd rm osd.X
EX:
ceph osd rm osd.21

3.) check status : ceph -s

4.)Observe data migration: ceph -w

5.) remove from CRUSH: ceph osd crush remove {name}
EX: ceph osd crush remove osd.21
5.b) del auth: ceph auth del osd.21

6.) find info on disk:
sudo hdparm -I /dev/sdd

7.) see sata ports: lsscsi --verbose

8.) Go pull the disk and replace it, or not and do the following steps to 
re-use it.

additional steps to remove and reuse a disk: (without ejecting, as ejecting and 
replace drops this for us)
(do this last after following the CEPH docs for remove a disk.)
9.) sudo gdisk /dev/sdX (x,z,Y,Y)
9.a)
 94  lsblk
 95  dmsetup remove 
ceph--e36dc03d--bf0d--462a--b4e6--8e49819bec0b-osd--block--d5574ac1--f72f--4942--8f4a--ac24891b2ee6

 10.) deploy a /dev/sdX disk: from 216.106.44.209 (ceph-mon0) you must be in 
the "my_cluster" folder:
EX: Synergy@Ceph-Mon0:~/my_cluster$ ceph-deploy osd create --data /dev/sdd 
synergy1

 I have attached my doc I use to accomplish this. *BEfore I do it, I mark the 
OSD as “out” via the GUI or CLI and allow it to reweight to 0%, this is 
monitored via Ceph -s. I do this so that there is not an actual disk fail which 
then puts me into dual disk fail when I’m rebuilding an OSD.

-Edward Kalk

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] 14.2.1 OSDs crash and sometimes fail to start back up, workaround

Reply via email to