On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami <mami.hayash...@uky.edu> wrote: > > WOW. With you two guiding me through every step, the 10 OSDs in question are > now added back to the cluster as Bluestore disks!!! Here are my responses to > the last email from Hector: > > 1. I first checked the permissions and they looked like this > > root@osd1:/var/lib/ceph/osd/ceph-60# ls -l > total 56 > -rw-r--r-- 1 ceph ceph 384 Nov 2 16:20 activate.monmap > -rw-r--r-- 1 ceph ceph 10737418240 Nov 2 16:20 block > lrwxrwxrwx 1 ceph ceph 14 Nov 2 16:20 block.db -> /dev/ssd0/db60 > > root@osd1:~# ls -l /dev/ssd0/ > ... > lrwxrwxrwx 1 root root 7 Nov 5 12:38 db60 -> ../dm-2 > > root@osd1:~# ls -la /dev/ > ... > brw-rw---- 1 root disk 252, 2 Nov 5 12:38 dm-2
This looks like a bug. You mentioned you are running 12.2.9, and we haven't seen problems in ceph-volume that fail to update the permissions on OSD devices. No one should need a UDEV rule to set the permissions for devices, this is a ceph-volume task. When a system starts and the OSD activation happens, it always ensures that the permissions are set correctly. Could you find the section of the logs in /var/log/ceph/ceph-volume.log that shows the activation process for ssd0/db60 ? Hopefully you still have those around, it would help us determine why the permissions aren't being set correctly. > ... > > 2. I then ran ceph-volume activate --all again. Saw the same error for > osd.67 I described many emails ago.. None of the permissions changed. I > tried restarting ceph-osd@60, but got the same error as before: > > 2018-11-05 15:34:52.001782 7f5a15744e00 0 set uid:gid to 64045:64045 > (ceph:ceph) > 2018-11-05 15:34:52.001808 7f5a15744e00 0 ceph version 12.2.9 > (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable), process > ceph-osd, pid 36506 > 2018-11-05 15:34:52.021717 7f5a15744e00 0 pidfile_write: ignore empty > --pid-file > 2018-11-05 15:34:52.033478 7f5a15744e00 0 load: jerasure load: lrc load: isa > 2018-11-05 15:34:52.033557 7f5a15744e00 1 bdev create path > /var/lib/ceph/osd/ceph-60/block type kernel > 2018-11-05 15:34:52.033572 7f5a15744e00 1 bdev(0x5651bd1b8d80 > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block > 2018-11-05 15:34:52.033888 7f5a15744e00 1 bdev(0x5651bd1b8d80 > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) > block_size 4096 (4KiB) rotational > 2018-11-05 15:34:52.033958 7f5a15744e00 1 > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 > meta 0.4 kv 0.4 data 0.2 > 2018-11-05 15:34:52.033984 7f5a15744e00 1 bdev(0x5651bd1b8d80 > /var/lib/ceph/osd/ceph-60/block) close > 2018-11-05 15:34:52.318993 7f5a15744e00 1 > bluestore(/var/lib/ceph/osd/ceph-60) _mount path /var/lib/ceph/osd/ceph-60 > 2018-11-05 15:34:52.319064 7f5a15744e00 1 bdev create path > /var/lib/ceph/osd/ceph-60/block type kernel > 2018-11-05 15:34:52.319073 7f5a15744e00 1 bdev(0x5651bd1b8fc0 > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block > 2018-11-05 15:34:52.319356 7f5a15744e00 1 bdev(0x5651bd1b8fc0 > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) > block_size 4096 (4KiB) rotational > 2018-11-05 15:34:52.319415 7f5a15744e00 1 > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 > meta 0.4 kv 0.4 data 0.2 > 2018-11-05 15:34:52.319491 7f5a15744e00 1 bdev create path > /var/lib/ceph/osd/ceph-60/block.db type kernel > 2018-11-05 15:34:52.319499 7f5a15744e00 1 bdev(0x5651bd1b9200 > /var/lib/ceph/osd/ceph-60/block.db) open path > /var/lib/ceph/osd/ceph-60/block.db > 2018-11-05 15:34:52.319514 7f5a15744e00 -1 bdev(0x5651bd1b9200 > /var/lib/ceph/osd/ceph-60/block.db) open open got: (13) Permission denied > 2018-11-05 15:34:52.319648 7f5a15744e00 -1 > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied > 2018-11-05 15:34:52.319666 7f5a15744e00 1 bdev(0x5651bd1b8fc0 > /var/lib/ceph/osd/ceph-60/block) close > 2018-11-05 15:34:52.598249 7f5a15744e00 -1 osd.60 0 OSD:init: unable to mount > object store > 2018-11-05 15:34:52.598269 7f5a15744e00 -1 ** ERROR: osd init failed: (13) > Permission denied > > 3. Finally, I literally copied and pasted the udev rule Hector wrote out for > me, then rebooted the server. > > 4. I tried restarting ceph-osd@60 -- this time it came right up!!! I was > able to start all the rest, including ceph-osd@67 which I thought did not get > activated by lvm. > > 5. I checked from the admin node and verified osd.60-69 are all in the > cluster as Bluestore OSDs and they indeed are. > > ******************** > Thank you SO MUCH, both of you, for putting up with my novice questions all > the way. I am planning to convert the rest of the cluster the same way by > reviewing this entire thread to trace what steps need to be taken. > > Mami > > On Mon, Nov 5, 2018 at 3:00 PM, Hector Martin <hec...@marcansoft.com> wrote: >> >> >> >> On 11/6/18 3:31 AM, Hayashida, Mami wrote: >> > 2018-11-05 12:47:01.075573 7f1f2775ae00 -1 >> > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block >> > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied >> >> Looks like the permissions on the block.db device are wrong. As far as I >> know ceph-volume is responsible for setting this at activation time. >> >> > I already ran the "ceph-volume lvm activate --all " command right after >> > I prepared (using "lvm prepare") those OSDs. Do I need to run the >> > "activate" command again? >> >> The activation is required on every boot to create the >> /var/lib/ceph/osd/* directory, but that should be automatically done by >> systemd units (since you didn't run it after the reboot and yet the >> directories exist, it seems to have worked). >> >> Can you ls -l the OSD directory (/var/lib/ceph/osd/ceph-60/) and also >> any devices symlinked to from there, to see the permissions? >> >> Then run the activate command again and list the permissions again to >> see if they have changed, and if they have, try to start the OSD again. >> >> I found one Ubuntu bug that suggests there may be a race condition: >> >> https://bugs.launchpad.net/bugs/1767087 >> >> I get the feeling the ceph-osd activation may be happening before the >> block.db device is ready, so when it gets created by LVM it's already >> too late and doesn't have the right permissions. You could fix it with a >> udev rule (like Ubuntu did) but if this is indeed your issue then it >> sounds like something that should be fixed in Ceph. Perhaps all you need >> is a systemd unit override to make sure ceph-volume@* services only >> start after LVM is ready. >> >> A usable udev rule could look like this (e.g. put it in >> /etc/udev/rules.d/90-lvm-permisions.rules): >> >> ACTION=="change", SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \ >> ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \ >> OWNER="ceph", GROUP="ceph", MODE="660" >> >> Reboot after that and see if the OSDs come up without further action. >> >> -- >> Hector Martin (hec...@marcansoft.com) >> Public Key: https://mrcn.st/pub > > > > > -- > Mami Hayashida > Research Computing Associate > > Research Computing Infrastructure > University of Kentucky Information Technology Services > 301 Rose Street | 102 James F. Hardymon Building > Lexington, KY 40506-0495 > mami.hayash...@uky.edu > (859)323-7521 _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com