Hi Folks, I have found similar reports of this problem in the past but can't seem to find any solution to it. We have ceph filesystem running mimic version 13.2.5. OSDs are running on AWS EC2 instances with centos 7. OSD disk is an AWS nvme device.
Problem I, sometimes when rebooting an OSD instance, the OSD volume fails to
mount and the OSD cannot start.
ceph-volume.log repeats the following
[2019-08-28 09:10:42,061][ceph_volume.main][INFO ] Running command:
ceph-volume lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,063][ceph_volume.process][INFO ] Running command:
/usr/sbin/lvs --noheadings --readonly --separator=";" -o
lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2019-08-28 09:10:42,074][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59,
in newfunc
return f(*a, **kw)
File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in
dispatch
instance.main()
File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line
40, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in
dispatch
instance.main()
File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in
is_root
return func(*a, **kw)
File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/trigger.py",
line 70, in main
Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py",
line 339, in main
self.activate(args)
File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16,
in is_root
return func(*a, **kw)
File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py",
line 249, in activate
raise RuntimeError('could not find osd.%s with fsid %s' % (osd_id,
osd_fsid))
RuntimeError: could not find osd.0 with fsid
fcaffe93-4c03-403c-9702-7f1ec694a578
ceph-volume-systemd.log repeats
[2019-08-28 09:10:41,877][systemd][INFO ] raw systemd input received:
lvm-0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:41,877][systemd][INFO ] parsed sub-command: lvm, extra data:
0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:41,926][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,077][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.0 with fsid
fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,084][systemd][WARNING] command returned non-zero exit
status: 1
[2019-08-28 09:10:42,084][systemd][WARNING] failed activating OSD, retries
left: 30
To recover I destroy the OSD, zap the disk and create it again.
# ceph osd destroy 0 --yes-i-really-mean-it
# ceph-volume lvm zap /dev/nvme1n1 -destroy
# ceph-volume lvm create --osd-id 0 --data /dev/nvme1n1
# systemctl start ceph-osd@0
Is there something I need to do so that the OSD can boot without these problems?
Thank you!
Tom
ceph-volume.log
Description: ceph-volume.log
ceph-volume-systemd.log
Description: ceph-volume-systemd.log
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
