Am 26.02.2018 um 13:02 schrieb Alfredo Deza: > On Sat, Feb 24, 2018 at 1:26 PM, Oliver Freyermuth > <[email protected]> wrote: >> Dear Cephalopodians, >> >> when purging a single OSD on a host (created via ceph-deploy 2.0, i.e. using >> ceph-volume lvm), I currently proceed as follows: >> >> On the OSD-host: >> $ systemctl stop [email protected] >> $ ls -la /var/lib/ceph/osd/ceph-4 >> # Check block und block.db links: >> lrwxrwxrwx. 1 ceph ceph 93 23. Feb 01:28 block -> >> /dev/ceph-69b1fbe5-f084-4410-a99a-ab57417e7846/osd-block-cd273506-e805-40ac-b23d-c7b9ff45d874 >> lrwxrwxrwx. 1 root root 43 23. Feb 01:28 block.db -> >> /dev/ceph-osd-blockdb-ssd-1/db-for-disk-sda >> # resolve actual underlying device: >> $ pvs | grep ceph-69b1fbe5-f084-4410-a99a-ab57417e7846 >> /dev/sda ceph-69b1fbe5-f084-4410-a99a-ab57417e7846 lvm2 a-- <3,64t >> 0 >> # Zap the device: >> $ ceph-volume lvm zap --destroy /dev/sda >> >> Now, on the mon: >> # purge the OSD: >> $ ceph osd purge osd.4 --yes-i-really-mean-it >> >> Then I re-deploy using: >> $ ceph-deploy --overwrite-conf osd create --bluestore --block-db >> ceph-osd-blockdb-ssd-1/db-for-disk-sda --data /dev/sda osd001 >> >> from the admin-machine. >> >> This works just fine, however, it leaves a stray ceph-volume service behind: >> $ ls -la /etc/systemd/system/multi-user.target.wants/ -1 | grep >> ceph-volume@lvm-4 >> lrwxrwxrwx. 1 root root 44 24. Feb 18:30 >> [email protected] -> >> /usr/lib/systemd/system/[email protected] >> lrwxrwxrwx. 1 root root 44 23. Feb 01:28 >> [email protected] -> >> /usr/lib/systemd/system/[email protected] >> >> This stray service then, after reboot of the machine, stays in activating >> state (since the disk will of course never come back): >> ----------------------------------- >> $ systemctl status >> [email protected] >> ● [email protected] - Ceph >> Volume activation: lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874 >> Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; >> vendor preset: disabled) >> Active: activating (start) since Sa 2018-02-24 19:21:47 CET; 1min 12s ago >> Main PID: 1866 (timeout) >> CGroup: >> /system.slice/system-ceph\x2dvolume.slice/[email protected] >> ├─1866 timeout 10000 /usr/sbin/ceph-volume-systemd >> lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874 >> └─1872 /usr/bin/python2.7 /usr/sbin/ceph-volume-systemd >> lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874 >> >> Feb 24 19:21:47 osd001.baf.physik.uni-bonn.de systemd[1]: Starting Ceph >> Volume activation: lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874... >> ----------------------------------- >> Manually, I can fix this by running: >> $ systemctl disable >> [email protected] >> >> My question is: Should I really remove that manually? >> Should "ceph-volume lvm zap --destroy" have taken care of it (bug)? > > You should remove it manually. The problem with zapping is that we > might not have the information we need to remove the systemd unit. > Since an OSD can be made out of different devices, ceph-volume might > be asked to "zap" a device which it can't compute to what OSD it > belongs. The systemd units are tied to the ID and UUID of the OSD.
Understood, thanks for the reply!
Could this be added to the documentation at some point for all the other users
operating the cluster manually / with ceph-deploy?
This would likely be best to prevent others from falling into this trap ;-).
Should I open a ticket asking for this?
Cheers,
Oliver
>
>
>> Am I missing a step?
>>
>> Cheers,
>> Oliver
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
