Re: [ceph-users] v11.2.0 Disk activation issue while booting

David Turner Wed, 14 Jun 2017 08:51:06 -0700

Note, I am not certain that this follows the same for bluestore.  I haven't
set up bluestore osds yet.


On Wed, Jun 14, 2017 at 11:48 AM David Turner <[email protected]> wrote:

> tl;dr to get a ceph journal to work with udev rules run this command
> substituting your device name (/dev/sdb) and partition number used twice
> (=4).
> sgdisk /dev/sdb -t=4:45B0969E-9B03-4F30-B4C6-5EC00CEFF106 -c=4:'ceph
> journal'
> And this for your osds replacing the device name (/dev/sdc) and partition
> number used twice (=1).
> sgdisk /dev/sdc -t=1:4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D -c=1:'ceph data'
>
> At the bottom is the relevant information from a printout of sgdisk for
> journal and osd partitions that are configured properly to have the udev
> rules manage the devices properly.
>
> The "Partition unique GUID" is different for every disk.  Generally this
> is auto-assigned, but you can specify it if you'd like.  A time when it is
> really nice to specify is if you are swapping out a failing journal
> device.  You can recreate the new partitions with the same GUIDS and if you
> set up your /var/lib/ceph/osd/ceph-##/journal symlink to use that guid (ie
> /dev/disk/by-partuuid/053f386f-223c-43a2-9843-6462dfb3857a for my /dev/sdb4
> journal partition) then you won't need to make any changes on the OSD when
> you swap it out (other than the commands to flush and create the journal).
>
> The "Partition name" is just nice to have to easily be able to tell what a
> disk was used for if you throw it on a stack one day.  It isn't necessary,
> but nice.  'ceph journal' and 'ceph data' are the defaults that the
> ceph-deploy script will do, so it's also nice to match those anyway.
>
> The real key to getting the udev rules to work is the "Partition GUID
> code".  A ceph journal should always have
> 45B0969E-9B03-4F30-B4C6-5EC00CEFF106 as its "Partition GUID code" and a
> ceph osd should always have 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D.  That is
> hardcoded into the udev rules that ceph installs and has not changed ever
> that I'm aware of.
>
> # sgdisk /dev/sdb -i=4
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-5EC00CEFF106 (Unknown)
> Partition unique GUID: 053F386F-223C-43A2-9843-6462DFB3857A
> Partition name: 'ceph journal'
> # sgdisk /dev/sdb -i=5
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-5EC00CEFF106 (Unknown)
> Partition unique GUID: 6448D160-A2B7-4EF9-BC67-12F4D49D9FDD
> Attribute flags: 0000000000000000
> Partition name: 'ceph journal'
>
> root@ceph1:~#  sgdisk /dev/sdc -i=1
> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> Partition unique GUID: 036FCF9D-2865-4822-9F93-94C3B5750DDC
> Partition name: 'ceph data'
> root@ceph1:~#  sgdisk /dev/sdd -i=1
> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> Partition unique GUID: D74AB67E-C7F6-4974-889F-ABEBA7F2DC2F
> Partition name: 'ceph data'
>
> On Wed, Jun 14, 2017 at 4:02 AM nokia ceph <[email protected]>
> wrote:
>
>> Hello David,
>>
>> Thanks for the update.
>>
>> http://tracker.ceph.com/issues/13833#note-7 - As per this tracker they
>> mentioned that the GUID may differ which cause udev were unable to chown
>> ceph.
>>
>> We are following below procedure to create OSD's
>>
>> #sgdisk -Z /dev/sdb
>> #ceph-disk prepare --bluestore --cluster ceph --cluster-uuid <fsid>
>> /dev/vdb
>> #ceph-disk --verbose activate /dev/vdb1
>>
>> Here you can see all the device haiving same GUID.
>>
>> #for i in b c d ; do  /usr/sbin/blkid -o udev -p /dev/vd$i\1 | grep
>> ID_PART_ENTRY_TYPE; done
>> ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d
>> ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d
>> ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d
>>
>>
>> Currently we are facing issue with the OSD activation while boot.  which
>> caused the OSD journal device mounted like this..
>>
>> ~~~
>> /dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL
>> ~~~
>>
>> At the same time on OSD logs, we getting like, osd.2 can't able to find
>> the mounted journal device hence it landed into failure state..
>>
>> ~~~
>> May 26 15:40:39 cn1 ceph-osd: 2017-05-26 15:40:39.978072 7f1dc3bc2940 -1
>> #033[0;31m ** ERROR: unable to open OSD superblock on
>> /var/lib/ceph/osd/ceph-2: (2) No such file or directory#033[0m
>> May 26 15:40:39 cn1 systemd: [email protected]: main process exited,
>> code=exited, status=1/FAILURE
>> May 26 15:40:39 cn1 systemd: Unit [email protected] entered failed
>> state.
>> May 26 15:40:39 cn1 systemd: [email protected] failed.
>> ~~~
>>
>> To fix this problem, we are following below workaround...
>>
>> #umount /var/lib/ceph/tmp/mnt.om4Lbq
>>
>> Mount the device with respective osd number.
>> #mount /dev/sdb1 /var/lib/ceph/osd/ceph-2
>>
>> Then start the osd.
>> #systemctl start [email protected].
>>
>> We notice below services fail at the same time.
>>
>> ===
>> systemctl --failed
>>   UNIT                              LOAD      ACTIVE SUB    DESCRIPTION
>> ● var-lib-ceph-tmp-mnt.UiCYFu.mount not-found failed failed
>> var-lib-ceph-tmp-mnt.UiCYFu.mount
>> ● [email protected]        loaded    failed failed Ceph disk
>> activation: /dev/sdc1
>> ● [email protected]        loaded    failed failed Ceph disk
>> activation: /dev/sdd1
>> ● [email protected]        loaded    failed failed Ceph disk
>> activation: /dev/sdd2
>> ===
>>
>> Need your suggestion to proceed further
>>
>> Thanks
>> Jayaram
>>
>> On Tue, Jun 13, 2017 at 7:30 PM, David Turner <[email protected]>
>> wrote:
>>
>>> I came across this a few times.  My problem was with journals I set up
>>> by myself.  I didn't give them the proper GUID partition type ID so the
>>> udev rules didn't know how to make sure the partition looked correct.  What
>>> the udev rules were unable to do was chown the journal block device as
>>> ceph:ceph so that it could be opened by the Ceph user.  You can test by
>>> chowning the journal block device and try to start the OSD again.
>>>
>>> Alternatively if you want to see more information, you can start the
>>> daemon manually as opposed to starting it through systemd and see what its
>>> output looks like.
>>>
>>> On Tue, Jun 13, 2017 at 6:32 AM nokia ceph <[email protected]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Some osd's not getting activated after a reboot operation which cause
>>>> that particular osd's landing in failed state.
>>>>
>>>> Here you can see mount points were not getting updated to osd-num and
>>>> mounted as a incorrect mount point, which caused osd.<num> can't able to
>>>> mount/activate the osd's.
>>>>
>>>> Env:- RHEL 7.2 - EC 4+1, v11.2.0 bluestore.
>>>>
>>>> #grep mnt proc/mounts
>>>> /dev/sdh1 /var/lib/ceph/tmp/mnt.om4Lbq xfs
>>>> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota 0 0
>>>> /dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL xfs
>>>> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota 0 0
>>>>
>>>> From /var/log/messages..
>>>>
>>>> --
>>>> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh2...
>>>> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh1...
>>>>
>>>>
>>>> May 26 15:39:58 cn1 systemd: *start request repeated too quickly for*
>>>> [email protected]   => suspecting this could be root cause.
>>>> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation:
>>>> /dev/sdh2.
>>>> May 26 15:39:58 cn1 systemd: Unit [email protected] entered
>>>> failed state.
>>>> May 26 15:39:58 cn1 systemd: [email protected] failed.
>>>> May 26 15:39:58 cn1 systemd: start request repeated too quickly for
>>>> [email protected]
>>>> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation:
>>>> /dev/sdh1.
>>>> May 26 15:39:58 cn1 systemd: Unit [email protected] entered
>>>> failed state.
>>>> May 26 15:39:58 cn1 systemd: [email protected] failed.
>>>> --
>>>>
>>>> But this issue will occur intermittently  after a reboot operation.
>>>>
>>>> Note;- We haven't face this problem in Jewel.
>>>>
>>>> Awaiting for comments.
>>>>
>>>> Thanks
>>>> Jayaram
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v11.2.0 Disk activation issue while booting

Reply via email to