On Mon, Nov 5, 2018 at 4:21 PM Hayashida, Mami <mami.hayash...@uky.edu> wrote: > > Yes, I still have the volume log showing the activation process for ssd0/db60 > (and 61-69 as well). I will email it to you directly as an attachment.
In the logs, I see that ceph-volume does set the permissions correctly: [2018-11-02 16:20:07,238][ceph_volume.process][INFO ] Running command: chown -h ceph:ceph /dev/hdd60/data60 [2018-11-02 16:20:07,242][ceph_volume.process][INFO ] Running command: chown -R ceph:ceph /dev/dm-10 [2018-11-02 16:20:07,246][ceph_volume.process][INFO ] Running command: ln -s /dev/hdd60/data60 /var/lib/ceph/osd/ceph-60/block [2018-11-02 16:20:07,249][ceph_volume.process][INFO ] Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-60/activate.monmap [2018-11-02 16:20:07,530][ceph_volume.process][INFO ] stderr got monmap epoch 2 [2018-11-02 16:20:07,547][ceph_volume.process][INFO ] Running command: ceph-authtool /var/lib/ceph/osd/ceph-60/keyring --create-keyring --name osd.60 --add-key AQBysdxbNgdBNhAA6NQ/UWDHqGAZfFuryCWfxQ== [2018-11-02 16:20:07,579][ceph_volume.process][INFO ] stdout creating /var/lib/ceph/osd/ceph-60/keyring added entity osd.60 auth auth(auid = 18446744073709551615 key=AQBysdxbNgdBNhAA6NQ/UWDHqGAZfFuryCWfxQ== with 0 caps) [2018-11-02 16:20:07,583][ceph_volume.process][INFO ] Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-60/keyring [2018-11-02 16:20:07,587][ceph_volume.process][INFO ] Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-60/ [2018-11-02 16:20:07,591][ceph_volume.process][INFO ] Running command: chown -h ceph:ceph /dev/ssd0/db60 [2018-11-02 16:20:07,594][ceph_volume.process][INFO ] Running command: chown -R ceph:ceph /dev/dm-0 And the failures from osd.60 are *before* those successful chown calls (15:39:00). I wonder if somehow in the process there was a missing step and then it all got corrected. I am certain that the UDEV rule should *not* be in place for this to work. The changes in the path for /dev/dm-* is expected, as that is created every time the system boots. > > > On Mon, Nov 5, 2018 at 4:14 PM, Alfredo Deza <ad...@redhat.com> wrote: >> >> On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami <mami.hayash...@uky.edu> >> wrote: >> > >> > WOW. With you two guiding me through every step, the 10 OSDs in question >> > are now added back to the cluster as Bluestore disks!!! Here are my >> > responses to the last email from Hector: >> > >> > 1. I first checked the permissions and they looked like this >> > >> > root@osd1:/var/lib/ceph/osd/ceph-60# ls -l >> > total 56 >> > -rw-r--r-- 1 ceph ceph 384 Nov 2 16:20 activate.monmap >> > -rw-r--r-- 1 ceph ceph 10737418240 Nov 2 16:20 block >> > lrwxrwxrwx 1 ceph ceph 14 Nov 2 16:20 block.db -> /dev/ssd0/db60 >> > >> > root@osd1:~# ls -l /dev/ssd0/ >> > ... >> > lrwxrwxrwx 1 root root 7 Nov 5 12:38 db60 -> ../dm-2 >> > >> > root@osd1:~# ls -la /dev/ >> > ... >> > brw-rw---- 1 root disk 252, 2 Nov 5 12:38 dm-2 >> >> This looks like a bug. You mentioned you are running 12.2.9, and we >> haven't seen problems in ceph-volume that fail to update the >> permissions on OSD devices. No one should need a UDEV rule to set the >> permissions for >> devices, this is a ceph-volume task. >> >> When a system starts and the OSD activation happens, it always ensures >> that the permissions are set correctly. Could you find the section of >> the logs in /var/log/ceph/ceph-volume.log that shows the activation >> process for ssd0/db60 ? >> >> Hopefully you still have those around, it would help us determine why >> the permissions aren't being set correctly. >> >> > ... >> > >> > 2. I then ran ceph-volume activate --all again. Saw the same error for >> > osd.67 I described many emails ago.. None of the permissions changed. I >> > tried restarting ceph-osd@60, but got the same error as before: >> > >> > 2018-11-05 15:34:52.001782 7f5a15744e00 0 set uid:gid to 64045:64045 >> > (ceph:ceph) >> > 2018-11-05 15:34:52.001808 7f5a15744e00 0 ceph version 12.2.9 >> > (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable), process >> > ceph-osd, pid 36506 >> > 2018-11-05 15:34:52.021717 7f5a15744e00 0 pidfile_write: ignore empty >> > --pid-file >> > 2018-11-05 15:34:52.033478 7f5a15744e00 0 load: jerasure load: lrc load: >> > isa >> > 2018-11-05 15:34:52.033557 7f5a15744e00 1 bdev create path >> > /var/lib/ceph/osd/ceph-60/block type kernel >> > 2018-11-05 15:34:52.033572 7f5a15744e00 1 bdev(0x5651bd1b8d80 >> > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block >> > 2018-11-05 15:34:52.033888 7f5a15744e00 1 bdev(0x5651bd1b8d80 >> > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, >> > 10GiB) block_size 4096 (4KiB) rotational >> > 2018-11-05 15:34:52.033958 7f5a15744e00 1 >> > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size >> > 1073741824 meta 0.4 kv 0.4 data 0.2 >> > 2018-11-05 15:34:52.033984 7f5a15744e00 1 bdev(0x5651bd1b8d80 >> > /var/lib/ceph/osd/ceph-60/block) close >> > 2018-11-05 15:34:52.318993 7f5a15744e00 1 >> > bluestore(/var/lib/ceph/osd/ceph-60) _mount path /var/lib/ceph/osd/ceph-60 >> > 2018-11-05 15:34:52.319064 7f5a15744e00 1 bdev create path >> > /var/lib/ceph/osd/ceph-60/block type kernel >> > 2018-11-05 15:34:52.319073 7f5a15744e00 1 bdev(0x5651bd1b8fc0 >> > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block >> > 2018-11-05 15:34:52.319356 7f5a15744e00 1 bdev(0x5651bd1b8fc0 >> > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, >> > 10GiB) block_size 4096 (4KiB) rotational >> > 2018-11-05 15:34:52.319415 7f5a15744e00 1 >> > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size >> > 1073741824 meta 0.4 kv 0.4 data 0.2 >> > 2018-11-05 15:34:52.319491 7f5a15744e00 1 bdev create path >> > /var/lib/ceph/osd/ceph-60/block.db type kernel >> > 2018-11-05 15:34:52.319499 7f5a15744e00 1 bdev(0x5651bd1b9200 >> > /var/lib/ceph/osd/ceph-60/block.db) open path >> > /var/lib/ceph/osd/ceph-60/block.db >> > 2018-11-05 15:34:52.319514 7f5a15744e00 -1 bdev(0x5651bd1b9200 >> > /var/lib/ceph/osd/ceph-60/block.db) open open got: (13) Permission denied >> > 2018-11-05 15:34:52.319648 7f5a15744e00 -1 >> > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block >> > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied >> > 2018-11-05 15:34:52.319666 7f5a15744e00 1 bdev(0x5651bd1b8fc0 >> > /var/lib/ceph/osd/ceph-60/block) close >> > 2018-11-05 15:34:52.598249 7f5a15744e00 -1 osd.60 0 OSD:init: unable to >> > mount object store >> > 2018-11-05 15:34:52.598269 7f5a15744e00 -1 ** ERROR: osd init failed: >> > (13) Permission denied >> > >> > 3. Finally, I literally copied and pasted the udev rule Hector wrote out >> > for me, then rebooted the server. >> > >> > 4. I tried restarting ceph-osd@60 -- this time it came right up!!! I was >> > able to start all the rest, including ceph-osd@67 which I thought did not >> > get activated by lvm. >> > >> > 5. I checked from the admin node and verified osd.60-69 are all in the >> > cluster as Bluestore OSDs and they indeed are. >> > >> > ******************** >> > Thank you SO MUCH, both of you, for putting up with my novice questions >> > all the way. I am planning to convert the rest of the cluster the same >> > way by reviewing this entire thread to trace what steps need to be taken. >> > >> > Mami >> > >> > On Mon, Nov 5, 2018 at 3:00 PM, Hector Martin <hec...@marcansoft.com> >> > wrote: >> >> >> >> >> >> >> >> On 11/6/18 3:31 AM, Hayashida, Mami wrote: >> >> > 2018-11-05 12:47:01.075573 7f1f2775ae00 -1 >> >> > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block >> >> > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission >> >> > denied >> >> >> >> Looks like the permissions on the block.db device are wrong. As far as I >> >> know ceph-volume is responsible for setting this at activation time. >> >> >> >> > I already ran the "ceph-volume lvm activate --all " command right after >> >> > I prepared (using "lvm prepare") those OSDs. Do I need to run the >> >> > "activate" command again? >> >> >> >> The activation is required on every boot to create the >> >> /var/lib/ceph/osd/* directory, but that should be automatically done by >> >> systemd units (since you didn't run it after the reboot and yet the >> >> directories exist, it seems to have worked). >> >> >> >> Can you ls -l the OSD directory (/var/lib/ceph/osd/ceph-60/) and also >> >> any devices symlinked to from there, to see the permissions? >> >> >> >> Then run the activate command again and list the permissions again to >> >> see if they have changed, and if they have, try to start the OSD again. >> >> >> >> I found one Ubuntu bug that suggests there may be a race condition: >> >> >> >> https://bugs.launchpad.net/bugs/1767087 >> >> >> >> I get the feeling the ceph-osd activation may be happening before the >> >> block.db device is ready, so when it gets created by LVM it's already >> >> too late and doesn't have the right permissions. You could fix it with a >> >> udev rule (like Ubuntu did) but if this is indeed your issue then it >> >> sounds like something that should be fixed in Ceph. Perhaps all you need >> >> is a systemd unit override to make sure ceph-volume@* services only >> >> start after LVM is ready. >> >> >> >> A usable udev rule could look like this (e.g. put it in >> >> /etc/udev/rules.d/90-lvm-permisions.rules): >> >> >> >> ACTION=="change", SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \ >> >> ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \ >> >> OWNER="ceph", GROUP="ceph", MODE="660" >> >> >> >> Reboot after that and see if the OSDs come up without further action. >> >> >> >> -- >> >> Hector Martin (hec...@marcansoft.com) >> >> Public Key: https://mrcn.st/pub >> > >> > >> > >> > >> > -- >> > Mami Hayashida >> > Research Computing Associate >> > >> > Research Computing Infrastructure >> > University of Kentucky Information Technology Services >> > 301 Rose Street | 102 James F. Hardymon Building >> > Lexington, KY 40506-0495 >> > mami.hayash...@uky.edu >> > (859)323-7521 > > > > > -- > Mami Hayashida > Research Computing Associate > > Research Computing Infrastructure > University of Kentucky Information Technology Services > 301 Rose Street | 102 James F. Hardymon Building > Lexington, KY 40506-0495 > mami.hayash...@uky.edu > (859)323-7521 _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com