[ceph-users] Disk activation issue on 10.2.9, too (Re: v11.2.0 Disk activation issue while booting)

Fulvio Galeazzi Fri, 21 Jul 2017 03:45:49 -0700

Hallo David, all,

sorry for hi-jacking the thread but I am seeing the same issue, although on 10.2.7/10.2.9...

Note that I am using disks taken from a SAN, so the GUIDs in my case are those relevant to MPATH.

As per other messages in this thread, I modified:
 - /usr/lib/systemd/system/ceph-osd.target
   adding to [Unit] stanza:
        Before=ceph.target
 - /usr/lib/udev/rules.d/60-ceph-by-parttypeuuid.rules
   added at the end of this line:

ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_PART_ENTRY_TYPE}=="?*", ENV{ID_PART_ENTRY_UUID}=="?*", SYMLINK+="disk/by-parttypeuuid/$env{ID_PART_ENTRY_TYPE}.$env{ID_PART_ENTRY_UUID}"

   the string:
, SYMLINK+="disk/by-partuuid/$env{ID_PART_ENTRY_UUID}"



df shows (picked a problematic partition and one which mounted OK)
.....

/dev/mapper/3600a0980005de7370000095a56c510cd1 3878873588 142004 3878731584 1% /var/lib/ceph/osd/cephba1-27 /dev/mapper/3600a0980005ddf7500001e2558e2bac7p1 7779931116 202720 7779728396 1% /var/lib/ceph/tmp/mnt.XL7WkY


Yet, for both the GUIDs seem correct:

=== /dev/mapper/3600a0980005de7370000095a56c510cd
Partition GUID code: 4FBD7E29-8AE0-4982-BF9D-5A8D867AF560 (Unknown)
Partition unique GUID: B01E2E0D-9903-4F23-A5FD-FC1C1CB458C3
Partition size: 7761536991 sectors (3.6 TiB)
Partition name: 'ceph data'
Partition GUID code: 45B0969E-8AE0-4982-BF9D-5A8D867AF560 (Unknown)
Partition unique GUID: E1B3970A-FABF-4AC0-8B6A-F7526989FF36
Partition size: 40960000 sectors (19.5 GiB)
Partition name: 'ceph journal'

=== /dev/mapper/3600a0980005ddf7500001e2558e2bac7
Partition GUID code: 4FBD7E29-8AE0-4982-BF9D-5A8D867AF560 (Unknown)
Partition unique GUID: 93A91EBF-A531-4002-A49F-B24F27E962DD
Partition size: 15564036063 sectors (7.2 TiB)
Partition name: 'ceph data'
Partition GUID code: 45B0969E-8AE0-4982-BF9D-5A8D867AF560 (Unknown)
Partition unique GUID: 2AF9B162-3398-49BD-B6EF-5D284C4A930B
Partition size: 40960000 sectors (19.5 GiB)
Partition name: 'ceph journal'

I rather suspect some sort of race condition, possibly causing hitting some timeout within systemctl... (please read the end of this message). I am led to think this because the OSDs which are successfully mounted after each reboot are a "random" subset of the configured ones (total ~40): also, after two or three mounts /var/lib/ceph/mnt... ceph-osd apparently gives up.

The only workaround I found to get things going is re-running ceph-ansible, but it takes soooo long...

Have you any idea as to what is going on here? Has anybody seen (and solved) the same issue?


  Thanks!

                        Fulvio





[[email protected] ~]# cat /var/lib/ceph/tmp/mnt.XL7WkY/whoami
143
[[email protected] ~]# umount /var/lib/ceph/tmp/mnt.XL7WkY
[[email protected] ~]# systemctl status [email protected]
● [email protected] - Ceph object storage daemon

Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Fri 2017-07-21 11:02:23 CEST; 1h 35min ago Process: 40466 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) Process: 40217 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)

 Main PID: 40466 (code=exited, status=1/FAILURE)

Jul 21 11:02:03 r3srv07.ba1.box.garr systemd[1]: [email protected]: main process exited, code=exited, status=1/FAILURE Jul 21 11:02:03 r3srv07.ba1.box.garr systemd[1]: Unit [email protected] entered failed state. Jul 21 11:02:03 r3srv07.ba1.box.garr systemd[1]: [email protected] failed. Jul 21 11:02:23 r3srv07.ba1.box.garr systemd[1]: [email protected] holdoff time over, scheduling restart. Jul 21 11:02:23 r3srv07.ba1.box.garr systemd[1]: start request repeated too quickly for [email protected] Jul 21 11:02:23 r3srv07.ba1.box.garr systemd[1]: Failed to start Ceph object storage daemon. Jul 21 11:02:23 r3srv07.ba1.box.garr systemd[1]: Unit [email protected] entered failed state. Jul 21 11:02:23 r3srv07.ba1.box.garr systemd[1]: [email protected] failed.

[[email protected] ~]# systemctl restart [email protected]
[[email protected] ~]# systemctl status [email protected]
● [email protected] - Ceph object storage daemon

Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Fri 2017-07-21 12:38:11 CEST; 1s ago Process: 74658 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) Process: 74644 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)

 Main PID: 74658 (code=exited, status=1/FAILURE)

Jul 21 12:38:11 r3srv07.ba1.box.garr systemd[1]: Unit [email protected] entered failed state. Jul 21 12:38:11 r3srv07.ba1.box.garr systemd[1]: [email protected] failed.

[[email protected] ~]# systemctl reset-failed [email protected]
[[email protected] ~]# systemctl restart [email protected]
[[email protected] ~]# systemctl status [email protected]
● [email protected] - Ceph object storage daemon

Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Fri 2017-07-21 12:38:34 CEST; 1s ago Process: 74787 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) Process: 74779 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)

 Main PID: 74787 (code=exited, status=1/FAILURE)

Jul 21 12:38:34 r3srv07.ba1.box.garr systemd[1]: Unit [email protected] entered failed state. Jul 21 12:38:34 r3srv07.ba1.box.garr systemd[1]: [email protected] failed.

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Disk activation issue on 10.2.9, too (Re: v11.2.0 Disk activation issue while booting)

Reply via email to