On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami <mami.hayash...@uky.edu> wrote:
>
> WOW.  With you two guiding me through every step, the 10 OSDs in question are 
> now added back to the cluster as Bluestore disks!!!  Here are my responses to 
> the last email from Hector:
>
> 1. I first checked the permissions and they looked like this
>
> root@osd1:/var/lib/ceph/osd/ceph-60# ls -l
> total 56
> -rw-r--r-- 1 ceph ceph         384 Nov  2 16:20 activate.monmap
> -rw-r--r-- 1 ceph ceph 10737418240 Nov  2 16:20 block
> lrwxrwxrwx 1 ceph ceph          14 Nov  2 16:20 block.db -> /dev/ssd0/db60
>
> root@osd1:~# ls -l /dev/ssd0/
> ...
> lrwxrwxrwx 1 root root 7 Nov  5 12:38 db60 -> ../dm-2
>
> root@osd1:~# ls -la /dev/
> ...
> brw-rw----  1 root disk    252,   2 Nov  5 12:38 dm-2

This looks like a bug. You mentioned you are running 12.2.9, and we
haven't seen problems in ceph-volume that fail to update the
permissions on OSD devices. No one should need a UDEV rule to set the
permissions for
devices, this is a ceph-volume task.

When a system starts and the OSD activation happens, it always ensures
that the permissions are set correctly. Could you find the section of
the logs in /var/log/ceph/ceph-volume.log that shows the activation
process for ssd0/db60 ?

Hopefully you still have those around, it would help us determine why
the permissions aren't being set correctly.

> ...
>
> 2. I then ran ceph-volume activate --all again.  Saw the same error for 
> osd.67 I described many emails ago..  None of the permissions changed.  I 
> tried restarting ceph-osd@60, but got the same error as before:
>
> 2018-11-05 15:34:52.001782 7f5a15744e00  0 set uid:gid to 64045:64045 
> (ceph:ceph)
> 2018-11-05 15:34:52.001808 7f5a15744e00  0 ceph version 12.2.9 
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable), process 
> ceph-osd, pid 36506
> 2018-11-05 15:34:52.021717 7f5a15744e00  0 pidfile_write: ignore empty 
> --pid-file
> 2018-11-05 15:34:52.033478 7f5a15744e00  0 load: jerasure load: lrc load: isa
> 2018-11-05 15:34:52.033557 7f5a15744e00  1 bdev create path 
> /var/lib/ceph/osd/ceph-60/block type kernel
> 2018-11-05 15:34:52.033572 7f5a15744e00  1 bdev(0x5651bd1b8d80 
> /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
> 2018-11-05 15:34:52.033888 7f5a15744e00  1 bdev(0x5651bd1b8d80 
> /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) 
> block_size 4096 (4KiB) rotational
> 2018-11-05 15:34:52.033958 7f5a15744e00  1 
> bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 
> meta 0.4 kv 0.4 data 0.2
> 2018-11-05 15:34:52.033984 7f5a15744e00  1 bdev(0x5651bd1b8d80 
> /var/lib/ceph/osd/ceph-60/block) close
> 2018-11-05 15:34:52.318993 7f5a15744e00  1 
> bluestore(/var/lib/ceph/osd/ceph-60) _mount path /var/lib/ceph/osd/ceph-60
> 2018-11-05 15:34:52.319064 7f5a15744e00  1 bdev create path 
> /var/lib/ceph/osd/ceph-60/block type kernel
> 2018-11-05 15:34:52.319073 7f5a15744e00  1 bdev(0x5651bd1b8fc0 
> /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
> 2018-11-05 15:34:52.319356 7f5a15744e00  1 bdev(0x5651bd1b8fc0 
> /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) 
> block_size 4096 (4KiB) rotational
> 2018-11-05 15:34:52.319415 7f5a15744e00  1 
> bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 
> meta 0.4 kv 0.4 data 0.2
> 2018-11-05 15:34:52.319491 7f5a15744e00  1 bdev create path 
> /var/lib/ceph/osd/ceph-60/block.db type kernel
> 2018-11-05 15:34:52.319499 7f5a15744e00  1 bdev(0x5651bd1b9200 
> /var/lib/ceph/osd/ceph-60/block.db) open path 
> /var/lib/ceph/osd/ceph-60/block.db
> 2018-11-05 15:34:52.319514 7f5a15744e00 -1 bdev(0x5651bd1b9200 
> /var/lib/ceph/osd/ceph-60/block.db) open open got: (13) Permission denied
> 2018-11-05 15:34:52.319648 7f5a15744e00 -1 
> bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block 
> device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied
> 2018-11-05 15:34:52.319666 7f5a15744e00  1 bdev(0x5651bd1b8fc0 
> /var/lib/ceph/osd/ceph-60/block) close
> 2018-11-05 15:34:52.598249 7f5a15744e00 -1 osd.60 0 OSD:init: unable to mount 
> object store
> 2018-11-05 15:34:52.598269 7f5a15744e00 -1  ** ERROR: osd init failed: (13) 
> Permission denied
>
> 3. Finally, I literally copied and pasted the udev rule Hector wrote out for 
> me, then rebooted the server.
>
> 4. I tried restarting ceph-osd@60 -- this time it came right up!!!  I was 
> able to start all the rest, including ceph-osd@67 which I thought did not get 
> activated by lvm.
>
> 5. I checked from the admin node and verified osd.60-69 are all in the 
> cluster as Bluestore OSDs and they indeed are.
>
> ********************
> Thank you SO MUCH, both of you, for putting up with my novice questions all 
> the way.  I am planning to convert the rest of the cluster the same way by 
> reviewing this entire thread to trace what steps need to be taken.
>
> Mami
>
> On Mon, Nov 5, 2018 at 3:00 PM, Hector Martin <hec...@marcansoft.com> wrote:
>>
>>
>>
>> On 11/6/18 3:31 AM, Hayashida, Mami wrote:
>> > 2018-11-05 12:47:01.075573 7f1f2775ae00 -1 
>> > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block 
>> > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied
>>
>> Looks like the permissions on the block.db device are wrong. As far as I
>> know ceph-volume is responsible for setting this at activation time.
>>
>> > I already ran the "ceph-volume lvm activate --all "  command right after
>> > I prepared (using "lvm prepare") those OSDs.  Do I need to run the
>> > "activate" command again?
>>
>> The activation is required on every boot to create the
>> /var/lib/ceph/osd/* directory, but that should be automatically done by
>> systemd units (since you didn't run it after the reboot and yet the
>> directories exist, it seems to have worked).
>>
>> Can you ls -l the OSD directory (/var/lib/ceph/osd/ceph-60/) and also
>> any devices symlinked to from there, to see the permissions?
>>
>> Then run the activate command again and list the permissions again to
>> see if they have changed, and if they have, try to start the OSD again.
>>
>> I found one Ubuntu bug that suggests there may be a race condition:
>>
>> https://bugs.launchpad.net/bugs/1767087
>>
>> I get the feeling the ceph-osd activation may be happening before the
>> block.db device is ready, so when it gets created by LVM it's already
>> too late and doesn't have the right permissions. You could fix it with a
>> udev rule (like Ubuntu did) but if this is indeed your issue then it
>> sounds like something that should be fixed in Ceph. Perhaps all you need
>> is a systemd unit override to make sure ceph-volume@* services only
>> start after LVM is ready.
>>
>> A usable udev rule could look like this (e.g. put it in
>> /etc/udev/rules.d/90-lvm-permisions.rules):
>>
>> ACTION=="change", SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \
>> ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \
>> OWNER="ceph", GROUP="ceph", MODE="660"
>>
>> Reboot after that and see if the OSDs come up without further action.
>>
>> --
>> Hector Martin (hec...@marcansoft.com)
>> Public Key: https://mrcn.st/pub
>
>
>
>
> --
> Mami Hayashida
> Research Computing Associate
>
> Research Computing Infrastructure
> University of Kentucky Information Technology Services
> 301 Rose Street | 102 James F. Hardymon Building
> Lexington, KY 40506-0495
> mami.hayash...@uky.edu
> (859)323-7521
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to