On Mon, Nov 5, 2018 at 4:21 PM Hayashida, Mami <mami.hayash...@uky.edu> wrote:
>
> Yes, I still have the volume log showing the activation process for ssd0/db60 
> (and 61-69 as well).   I will email it to you directly as an attachment.

In the logs, I see that ceph-volume does set the permissions correctly:

[2018-11-02 16:20:07,238][ceph_volume.process][INFO  ] Running
command: chown -h ceph:ceph /dev/hdd60/data60
[2018-11-02 16:20:07,242][ceph_volume.process][INFO  ] Running
command: chown -R ceph:ceph /dev/dm-10
[2018-11-02 16:20:07,246][ceph_volume.process][INFO  ] Running
command: ln -s /dev/hdd60/data60 /var/lib/ceph/osd/ceph-60/block
[2018-11-02 16:20:07,249][ceph_volume.process][INFO  ] Running
command: ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
/var/lib/ceph/osd/ceph-60/activate.monmap
[2018-11-02 16:20:07,530][ceph_volume.process][INFO  ] stderr got monmap epoch 2
[2018-11-02 16:20:07,547][ceph_volume.process][INFO  ] Running
command: ceph-authtool /var/lib/ceph/osd/ceph-60/keyring
--create-keyring --name osd.60 --add-key
AQBysdxbNgdBNhAA6NQ/UWDHqGAZfFuryCWfxQ==
[2018-11-02 16:20:07,579][ceph_volume.process][INFO  ] stdout creating
/var/lib/ceph/osd/ceph-60/keyring
added entity osd.60 auth auth(auid = 18446744073709551615
key=AQBysdxbNgdBNhAA6NQ/UWDHqGAZfFuryCWfxQ== with 0 caps)
[2018-11-02 16:20:07,583][ceph_volume.process][INFO  ] Running
command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-60/keyring
[2018-11-02 16:20:07,587][ceph_volume.process][INFO  ] Running
command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-60/
[2018-11-02 16:20:07,591][ceph_volume.process][INFO  ] Running
command: chown -h ceph:ceph /dev/ssd0/db60
[2018-11-02 16:20:07,594][ceph_volume.process][INFO  ] Running
command: chown -R ceph:ceph /dev/dm-0

And the failures from osd.60 are *before* those successful chown calls
(15:39:00). I wonder if somehow in the process there was a missing
step and then it all got corrected. I am certain that the UDEV rule
should *not*
be in place for this to work.

The changes in the path for /dev/dm-* is expected, as that is created
every time the system boots.

>
>
> On Mon, Nov 5, 2018 at 4:14 PM, Alfredo Deza <ad...@redhat.com> wrote:
>>
>> On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami <mami.hayash...@uky.edu> 
>> wrote:
>> >
>> > WOW.  With you two guiding me through every step, the 10 OSDs in question 
>> > are now added back to the cluster as Bluestore disks!!!  Here are my 
>> > responses to the last email from Hector:
>> >
>> > 1. I first checked the permissions and they looked like this
>> >
>> > root@osd1:/var/lib/ceph/osd/ceph-60# ls -l
>> > total 56
>> > -rw-r--r-- 1 ceph ceph         384 Nov  2 16:20 activate.monmap
>> > -rw-r--r-- 1 ceph ceph 10737418240 Nov  2 16:20 block
>> > lrwxrwxrwx 1 ceph ceph          14 Nov  2 16:20 block.db -> /dev/ssd0/db60
>> >
>> > root@osd1:~# ls -l /dev/ssd0/
>> > ...
>> > lrwxrwxrwx 1 root root 7 Nov  5 12:38 db60 -> ../dm-2
>> >
>> > root@osd1:~# ls -la /dev/
>> > ...
>> > brw-rw----  1 root disk    252,   2 Nov  5 12:38 dm-2
>>
>> This looks like a bug. You mentioned you are running 12.2.9, and we
>> haven't seen problems in ceph-volume that fail to update the
>> permissions on OSD devices. No one should need a UDEV rule to set the
>> permissions for
>> devices, this is a ceph-volume task.
>>
>> When a system starts and the OSD activation happens, it always ensures
>> that the permissions are set correctly. Could you find the section of
>> the logs in /var/log/ceph/ceph-volume.log that shows the activation
>> process for ssd0/db60 ?
>>
>> Hopefully you still have those around, it would help us determine why
>> the permissions aren't being set correctly.
>>
>> > ...
>> >
>> > 2. I then ran ceph-volume activate --all again.  Saw the same error for 
>> > osd.67 I described many emails ago..  None of the permissions changed.  I 
>> > tried restarting ceph-osd@60, but got the same error as before:
>> >
>> > 2018-11-05 15:34:52.001782 7f5a15744e00  0 set uid:gid to 64045:64045 
>> > (ceph:ceph)
>> > 2018-11-05 15:34:52.001808 7f5a15744e00  0 ceph version 12.2.9 
>> > (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable), process 
>> > ceph-osd, pid 36506
>> > 2018-11-05 15:34:52.021717 7f5a15744e00  0 pidfile_write: ignore empty 
>> > --pid-file
>> > 2018-11-05 15:34:52.033478 7f5a15744e00  0 load: jerasure load: lrc load: 
>> > isa
>> > 2018-11-05 15:34:52.033557 7f5a15744e00  1 bdev create path 
>> > /var/lib/ceph/osd/ceph-60/block type kernel
>> > 2018-11-05 15:34:52.033572 7f5a15744e00  1 bdev(0x5651bd1b8d80 
>> > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
>> > 2018-11-05 15:34:52.033888 7f5a15744e00  1 bdev(0x5651bd1b8d80 
>> > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 
>> > 10GiB) block_size 4096 (4KiB) rotational
>> > 2018-11-05 15:34:52.033958 7f5a15744e00  1 
>> > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 
>> > 1073741824 meta 0.4 kv 0.4 data 0.2
>> > 2018-11-05 15:34:52.033984 7f5a15744e00  1 bdev(0x5651bd1b8d80 
>> > /var/lib/ceph/osd/ceph-60/block) close
>> > 2018-11-05 15:34:52.318993 7f5a15744e00  1 
>> > bluestore(/var/lib/ceph/osd/ceph-60) _mount path /var/lib/ceph/osd/ceph-60
>> > 2018-11-05 15:34:52.319064 7f5a15744e00  1 bdev create path 
>> > /var/lib/ceph/osd/ceph-60/block type kernel
>> > 2018-11-05 15:34:52.319073 7f5a15744e00  1 bdev(0x5651bd1b8fc0 
>> > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
>> > 2018-11-05 15:34:52.319356 7f5a15744e00  1 bdev(0x5651bd1b8fc0 
>> > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 
>> > 10GiB) block_size 4096 (4KiB) rotational
>> > 2018-11-05 15:34:52.319415 7f5a15744e00  1 
>> > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 
>> > 1073741824 meta 0.4 kv 0.4 data 0.2
>> > 2018-11-05 15:34:52.319491 7f5a15744e00  1 bdev create path 
>> > /var/lib/ceph/osd/ceph-60/block.db type kernel
>> > 2018-11-05 15:34:52.319499 7f5a15744e00  1 bdev(0x5651bd1b9200 
>> > /var/lib/ceph/osd/ceph-60/block.db) open path 
>> > /var/lib/ceph/osd/ceph-60/block.db
>> > 2018-11-05 15:34:52.319514 7f5a15744e00 -1 bdev(0x5651bd1b9200 
>> > /var/lib/ceph/osd/ceph-60/block.db) open open got: (13) Permission denied
>> > 2018-11-05 15:34:52.319648 7f5a15744e00 -1 
>> > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block 
>> > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied
>> > 2018-11-05 15:34:52.319666 7f5a15744e00  1 bdev(0x5651bd1b8fc0 
>> > /var/lib/ceph/osd/ceph-60/block) close
>> > 2018-11-05 15:34:52.598249 7f5a15744e00 -1 osd.60 0 OSD:init: unable to 
>> > mount object store
>> > 2018-11-05 15:34:52.598269 7f5a15744e00 -1  ** ERROR: osd init failed: 
>> > (13) Permission denied
>> >
>> > 3. Finally, I literally copied and pasted the udev rule Hector wrote out 
>> > for me, then rebooted the server.
>> >
>> > 4. I tried restarting ceph-osd@60 -- this time it came right up!!!  I was 
>> > able to start all the rest, including ceph-osd@67 which I thought did not 
>> > get activated by lvm.
>> >
>> > 5. I checked from the admin node and verified osd.60-69 are all in the 
>> > cluster as Bluestore OSDs and they indeed are.
>> >
>> > ********************
>> > Thank you SO MUCH, both of you, for putting up with my novice questions 
>> > all the way.  I am planning to convert the rest of the cluster the same 
>> > way by reviewing this entire thread to trace what steps need to be taken.
>> >
>> > Mami
>> >
>> > On Mon, Nov 5, 2018 at 3:00 PM, Hector Martin <hec...@marcansoft.com> 
>> > wrote:
>> >>
>> >>
>> >>
>> >> On 11/6/18 3:31 AM, Hayashida, Mami wrote:
>> >> > 2018-11-05 12:47:01.075573 7f1f2775ae00 -1 
>> >> > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block 
>> >> > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission 
>> >> > denied
>> >>
>> >> Looks like the permissions on the block.db device are wrong. As far as I
>> >> know ceph-volume is responsible for setting this at activation time.
>> >>
>> >> > I already ran the "ceph-volume lvm activate --all "  command right after
>> >> > I prepared (using "lvm prepare") those OSDs.  Do I need to run the
>> >> > "activate" command again?
>> >>
>> >> The activation is required on every boot to create the
>> >> /var/lib/ceph/osd/* directory, but that should be automatically done by
>> >> systemd units (since you didn't run it after the reboot and yet the
>> >> directories exist, it seems to have worked).
>> >>
>> >> Can you ls -l the OSD directory (/var/lib/ceph/osd/ceph-60/) and also
>> >> any devices symlinked to from there, to see the permissions?
>> >>
>> >> Then run the activate command again and list the permissions again to
>> >> see if they have changed, and if they have, try to start the OSD again.
>> >>
>> >> I found one Ubuntu bug that suggests there may be a race condition:
>> >>
>> >> https://bugs.launchpad.net/bugs/1767087
>> >>
>> >> I get the feeling the ceph-osd activation may be happening before the
>> >> block.db device is ready, so when it gets created by LVM it's already
>> >> too late and doesn't have the right permissions. You could fix it with a
>> >> udev rule (like Ubuntu did) but if this is indeed your issue then it
>> >> sounds like something that should be fixed in Ceph. Perhaps all you need
>> >> is a systemd unit override to make sure ceph-volume@* services only
>> >> start after LVM is ready.
>> >>
>> >> A usable udev rule could look like this (e.g. put it in
>> >> /etc/udev/rules.d/90-lvm-permisions.rules):
>> >>
>> >> ACTION=="change", SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \
>> >> ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \
>> >> OWNER="ceph", GROUP="ceph", MODE="660"
>> >>
>> >> Reboot after that and see if the OSDs come up without further action.
>> >>
>> >> --
>> >> Hector Martin (hec...@marcansoft.com)
>> >> Public Key: https://mrcn.st/pub
>> >
>> >
>> >
>> >
>> > --
>> > Mami Hayashida
>> > Research Computing Associate
>> >
>> > Research Computing Infrastructure
>> > University of Kentucky Information Technology Services
>> > 301 Rose Street | 102 James F. Hardymon Building
>> > Lexington, KY 40506-0495
>> > mami.hayash...@uky.edu
>> > (859)323-7521
>
>
>
>
> --
> Mami Hayashida
> Research Computing Associate
>
> Research Computing Infrastructure
> University of Kentucky Information Technology Services
> 301 Rose Street | 102 James F. Hardymon Building
> Lexington, KY 40506-0495
> mami.hayash...@uky.edu
> (859)323-7521
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to