Hi All,
We are seeing the same problem here at Ruthford Appleton Laboratory:
During our patching against Stack Clash on our large physics data cluster, when
rebooting the storage nodes about 8/36 OSD disks remount. We coaxed them to
mount manually during the reboot campaign (see method below) but obviously we
want a more long-term solution.
I believe this problem occurs as many of the OSD daemons are being started
before the OSD disk is mounted.
From: [ceph-users] erratic startup of OSDs at reboot time, 2017-07-12, Graham
Allan
We tried running: “udevadm trigger --subsystem-match=block --action=add” with
occasional success but this wasn’t reliable.
From: [ceph-users] CentOS7 Mounting Problem, 2017-04-10, Jake Young
Interesting that running partprobe causes the OSD disk to mount and the OSD to
start automatically. However, I don’t know why this would fix the problem for
subsequent reboots.
Note: Interestingly, I had one example of this model of storage node (36 OSDs
per host) in our development cluster (78 OSDs), over 5 reboots, all OSD disks
mounted and the OSD processes started, so I am unable to reproduce the problem
at small scale.
Best wishes,
Bruno
--------
Cluster:
5 MONs
1404 OSDs
39 storage nodes, 36 OSD disks per node connected to PCI via HBA
Software:
OS: SL7x
Ceph Release: kraken
Ceph Version: 11.2.0-0
Ceph Deploy Release: kraken
Ceph Deploy Version: 1.5.37-0
OSDs created as follows:
ceph-deploy disk zap $sn_fqdn:sdb
ceph-deploy --overwrite-conf config pull $sn_fqdn
ceph-deploy osd prepare $sn_fqdn:sdb
Coaxing method:
for srv in $(systemctl list-units -t service --full --no-pager -n0 | grep
ceph-disk | awk '"'"'{print $2}'"'"'); do
echo "Starting $srv" | ts
systemctl start $srv
sleep 1
done
From: ceph-users [mailto:[email protected]] On Behalf Of Willem
Jan Withagen
Sent: 20 July 2017 19:06
To: Roger Brown; ceph-users
Subject: Re: [ceph-users] ceph-disk activate-block: not a block device
Hi Roger,
Device detection has recently changed (because FreeBSD does not have
blockdevices).
So could very well be that this is an actual problem where something is still
wrong.
Please keep an eye out, and let me know if it comes back.
--WjW
Op 20-7-2017 om 19:29 schreef Roger Brown:
So I disabled ceph-disk and will chalk it up as a red herring to ignore.
On Thu, Jul 20, 2017 at 11:02 AM Roger Brown
<[email protected]<mailto:[email protected]>> wrote:
Also I'm just noticing osd1 is my only OSD host that even has an enabled target
for ceph-disk ([email protected]<mailto:[email protected]>).
roger@osd1:~$ systemctl list-units ceph*
UNIT LOAD ACTIVE SUB DESCRIPTION
● [email protected]<mailto:[email protected]> loaded failed
failed Ceph disk activation: /dev/sdb2
[email protected]<mailto:[email protected]> loaded active running
Ceph object storage daemon osd.3
ceph-mds.target loaded active active ceph target allowing to
start/stop all [email protected]<mailto:[email protected]> instances at once
ceph-mgr.target loaded active active ceph target allowing to
start/stop all [email protected]<mailto:[email protected]> instances at once
ceph-mon.target loaded active active ceph target allowing to
start/stop all [email protected]<mailto:[email protected]> instances at once
ceph-osd.target loaded active active ceph target allowing to
start/stop all [email protected]<mailto:[email protected]> instances at once
ceph-radosgw.target loaded active active ceph target allowing to
start/stop all [email protected]<mailto:[email protected]> instances at
once
ceph.target loaded active active ceph target allowing to
start/stop all ceph*@.service<mailto:ceph*@.service> instances at once
roger@osd2:~$ systemctl list-units ceph*
UNIT LOAD ACTIVE SUB DESCRIPTION
[email protected]<mailto:[email protected]> loaded active running Ceph
object storage daemon osd.4
ceph-mds.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-mgr.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-mon.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-osd.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-radosgw.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph.target loaded active active ceph target allowing to start/stop
all ceph*@.service<mailto:ceph*@.service> instances at once
roger@osd3:~$ systemctl list-units ceph*
UNIT LOAD ACTIVE SUB DESCRIPTION
[email protected]<mailto:[email protected]> loaded active running Ceph
object storage daemon osd.0
ceph-mds.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-mgr.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-mon.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-osd.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph-radosgw.target loaded active active ceph target allowing to start/stop
all [email protected]<mailto:[email protected]> instances at once
ceph.target loaded active active ceph target allowing to start/stop
all ceph*@.service<mailto:ceph*@.service> instances at once
On Thu, Jul 20, 2017 at 10:23 AM Roger Brown
<[email protected]<mailto:[email protected]>> wrote:
I think I need help with some OSD trouble. OSD daemons on two hosts started
flapping. At length, I rebooted host osd1 (osd.3), but the OSD daemon still
fails to start. Upon closer inspection,
[email protected]<mailto:[email protected]> is failing to
start due to, "Error: /dev/sdb2 is not a block device"
This is the command I see it failing to run:
roger@osd1:~$ sudo /usr/sbin/ceph-disk --verbose activate-block /dev/sdb2
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 9, in <module>
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5731, in run
main(sys.argv[1:])
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5682, in main
args.func(args)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5438, in
<lambda>
func=lambda args: main_activate_space(name, args),
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4160, in
main_activate_space
osd_uuid = get_space_osd_uuid(name, dev)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4115, in
get_space_osd_uuid
raise Error('%s is not a block device' % path)
ceph_disk.main.Error: Error: /dev/sdb2 is not a block device
osd1 environment:
$ ceph -v
ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
$ uname -r
4.4.0-83-generic
$ lsb_release -sc
xenial
Please advise.
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com