Re: [ceph-users] v11.2.0 Disk activation issue while booting

nokia ceph Wed, 14 Jun 2017 01:03:00 -0700

Hello David,

Thanks for the update.


http://tracker.ceph.com/issues/13833#note-7 - As per this tracker they
mentioned that the GUID may differ which cause udev were unable to chown
ceph.

We are following below procedure to create OSD's

#sgdisk -Z /dev/sdb
#ceph-disk prepare --bluestore --cluster ceph --cluster-uuid <fsid> /dev/vdb
#ceph-disk --verbose activate /dev/vdb1

Here you can see all the device haiving same GUID.

#for i in b c d ; do  /usr/sbin/blkid -o udev -p /dev/vd$i\1 | grep
ID_PART_ENTRY_TYPE; done
ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d
ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d
ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d


Currently we are facing issue with the OSD activation while boot.  which
caused the OSD journal device mounted like this..

~~~
/dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL
~~~

At the same time on OSD logs, we getting like, osd.2 can't able to find the
mounted journal device hence it landed into failure state..

~~~
May 26 15:40:39 cn1 ceph-osd: 2017-05-26 15:40:39.978072 7f1dc3bc2940 -1
#033[0;31m ** ERROR: unable to open OSD superblock on
/var/lib/ceph/osd/ceph-2: (2) No such file or directory#033[0m
May 26 15:40:39 cn1 systemd: [email protected]: main process exited,
code=exited, status=1/FAILURE
May 26 15:40:39 cn1 systemd: Unit [email protected] entered failed state.
May 26 15:40:39 cn1 systemd: [email protected] failed.
~~~

To fix this problem, we are following below workaround...

#umount /var/lib/ceph/tmp/mnt.om4Lbq

Mount the device with respective osd number.
#mount /dev/sdb1 /var/lib/ceph/osd/ceph-2

Then start the osd.
#systemctl start [email protected].

We notice below services fail at the same time.

===
systemctl --failed
  UNIT                              LOAD      ACTIVE SUB    DESCRIPTION
● var-lib-ceph-tmp-mnt.UiCYFu.mount not-found failed failed
var-lib-ceph-tmp-mnt.UiCYFu.mount
● [email protected]        loaded    failed failed Ceph disk
activation: /dev/sdc1
● [email protected]        loaded    failed failed Ceph disk
activation: /dev/sdd1
● [email protected]        loaded    failed failed Ceph disk
activation: /dev/sdd2
===

Need your suggestion to proceed further

Thanks
Jayaram

On Tue, Jun 13, 2017 at 7:30 PM, David Turner <[email protected]> wrote:

> I came across this a few times.  My problem was with journals I set up by
> myself.  I didn't give them the proper GUID partition type ID so the udev
> rules didn't know how to make sure the partition looked correct.  What the
> udev rules were unable to do was chown the journal block device as
> ceph:ceph so that it could be opened by the Ceph user.  You can test by
> chowning the journal block device and try to start the OSD again.
>
> Alternatively if you want to see more information, you can start the
> daemon manually as opposed to starting it through systemd and see what its
> output looks like.
>
> On Tue, Jun 13, 2017 at 6:32 AM nokia ceph <[email protected]>
> wrote:
>
>> Hello,
>>
>> Some osd's not getting activated after a reboot operation which cause
>> that particular osd's landing in failed state.
>>
>> Here you can see mount points were not getting updated to osd-num and
>> mounted as a incorrect mount point, which caused osd.<num> can't able to
>> mount/activate the osd's.
>>
>> Env:- RHEL 7.2 - EC 4+1, v11.2.0 bluestore.
>>
>> #grep mnt proc/mounts
>> /dev/sdh1 /var/lib/ceph/tmp/mnt.om4Lbq xfs 
>> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota
>> 0 0
>> /dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL xfs 
>> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota
>> 0 0
>>
>> From /var/log/messages..
>>
>> --
>> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh2...
>> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh1...
>>
>>
>> May 26 15:39:58 cn1 systemd: *start request repeated too quickly for*
>> [email protected]   => suspecting this could be root cause.
>> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation:
>> /dev/sdh2.
>> May 26 15:39:58 cn1 systemd: Unit [email protected] entered
>> failed state.
>> May 26 15:39:58 cn1 systemd: [email protected] failed.
>> May 26 15:39:58 cn1 systemd: start request repeated too quickly for
>> [email protected]
>> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation:
>> /dev/sdh1.
>> May 26 15:39:58 cn1 systemd: Unit [email protected] entered
>> failed state.
>> May 26 15:39:58 cn1 systemd: [email protected] failed.
>> --
>>
>> But this issue will occur intermittently  after a reboot operation.
>>
>> Note;- We haven't face this problem in Jewel.
>>
>> Awaiting for comments.
>>
>> Thanks
>> Jayaram
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v11.2.0 Disk activation issue while booting

Reply via email to