Dear ceph users,
I've been experimenting setting up a new node with ceph-volume and
bluestore. Most of the setup works right, but I'm running into a
strange interaction between ceph-volume and systemd when starting OSDs.
After preparing/activating the OSD, a systemd unit instance is created
with a symlink in /etc/systemd/system/multi-user.target.wants
[email protected] ->
/usr/lib/systemd/system/[email protected]
I've moved this dependency to ceph-osd.target.wants, since I'd like to
be able to start/stop all OSDs on the same node with one command (let me
know if there is a better way). The stopping works without this, since
[email protected] is marked as part of ceph-osd.target, but starting
does not since these new ceph-volume units aren't together in a separate
target.
However, when I run 'systemctl start ceph-osd.target' multiple times,
the systemctl command hangs, even though the OSD starts up fine.
Interestingly, 'systemctl start
[email protected]' does
not hang, however.
Troubleshooting further, I see that the [email protected] unit calls
'ceph-volume lvm trigger 121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb',
which in turn calls 'Activate', running a few systemd commands:
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
/dev/H900D00/H900D00 --path /var/lib/ceph/osd/ceph-121
Running command: ln -snf /dev/H900D00/H900D00
/var/lib/ceph/osd/ceph-121/block
Running command: chown -R ceph:ceph /dev/dm-0
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-121
Running command: systemctl enable
ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb
Running command: systemctl start ceph-osd@121
--> ceph-volume lvm activate successful for osd ID: 121
The problem seems to be the 'systemctl enable' command, which
essentially tries to enable the unit that is currently being executed
(for the case when running systemctl start ceph-osd.target). Somehow
systemd (in CentOS) isn't very happy with that. If I edit the python
scripts to check that the unit is not enabled before enabling it - the
hangs stop.
For example, replacing in
/usr/lib/python2.7/site-packages/ceph_volume/systemd/systemd.py
def enable(unit):
process.run(['systemctl', 'enable', unit])
with
def enable(unit):
stdout, stderr, retcode = process.call(['systemctl',
'is-enabled', unit], show_command=True)
if retcode != 0:
process.run(['systemctl', 'enable', unit])
fixes the issue.
Has anyone run into this, or has any ideas on how to proceed?
Andras
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com