It seems that I have been able to workaround my issues.
I’ve attempted to reproduce by rebooting nodes and using the stop all OSDs wait
a bit and start them.
At this time, no OSDs are crashing like before. OSDs seem to have no problems
starting either.
What I did is remove completely the OSDs one at a time and reissue them
allowing CEPH 14.2.1 to reengineer them.
Remove a disk:
1.) see which OSD is which disk: sudo ceph-volume lvm list
2.) ceph osd out X
EX:
synergy@synergy3:~$ ceph osd out 21
marked out osd.21.
2.a) ceph osd down osd.X
Ex:
ceph osd down osd.21
2.aa) Stop OSD daemon: sudo systemctl stop ceph-osd@X
EX:
sudo systemctl stop ceph-osd@21
2.b) ceph osd rm osd.X
EX:
ceph osd rm osd.21
3.) check status : ceph -s
4.)Observe data migration: ceph -w
5.) remove from CRUSH: ceph osd crush remove {name}
EX: ceph osd crush remove osd.21
5.b) del auth: ceph auth del osd.21
6.) find info on disk:
sudo hdparm -I /dev/sdd
7.) see sata ports: lsscsi --verbose
8.) Go pull the disk and replace it, or not and do the following steps to
re-use it.
additional steps to remove and reuse a disk: (without ejecting, as ejecting and
replace drops this for us)
(do this last after following the CEPH docs for remove a disk.)
9.) sudo gdisk /dev/sdX (x,z,Y,Y)
9.a)
94 lsblk
95 dmsetup remove
ceph--e36dc03d--bf0d--462a--b4e6--8e49819bec0b-osd--block--d5574ac1--f72f--4942--8f4a--ac24891b2ee6
10.) deploy a /dev/sdX disk: from 216.106.44.209 (ceph-mon0) you must be in
the "my_cluster" folder:
EX: Synergy@Ceph-Mon0:~/my_cluster$ ceph-deploy osd create --data /dev/sdd
synergy1
I have attached my doc I use to accomplish this. *BEfore I do it, I mark the
OSD as “out” via the GUI or CLI and allow it to reweight to 0%, this is
monitored via Ceph -s. I do this so that there is not an actual disk fail which
then puts me into dual disk fail when I’m rebuilding an OSD.
-Edward Kalk
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com