Hello David, Thanks for the update.
http://tracker.ceph.com/issues/13833#note-7 - As per this tracker they mentioned that the GUID may differ which cause udev were unable to chown ceph. We are following below procedure to create OSD's #sgdisk -Z /dev/sdb #ceph-disk prepare --bluestore --cluster ceph --cluster-uuid <fsid> /dev/vdb #ceph-disk --verbose activate /dev/vdb1 Here you can see all the device haiving same GUID. #for i in b c d ; do /usr/sbin/blkid -o udev -p /dev/vd$i\1 | grep ID_PART_ENTRY_TYPE; done ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d Currently we are facing issue with the OSD activation while boot. which caused the OSD journal device mounted like this.. ~~~ /dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL ~~~ At the same time on OSD logs, we getting like, osd.2 can't able to find the mounted journal device hence it landed into failure state.. ~~~ May 26 15:40:39 cn1 ceph-osd: 2017-05-26 15:40:39.978072 7f1dc3bc2940 -1 #033[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or directory#033[0m May 26 15:40:39 cn1 systemd: [email protected]: main process exited, code=exited, status=1/FAILURE May 26 15:40:39 cn1 systemd: Unit [email protected] entered failed state. May 26 15:40:39 cn1 systemd: [email protected] failed. ~~~ To fix this problem, we are following below workaround... #umount /var/lib/ceph/tmp/mnt.om4Lbq Mount the device with respective osd number. #mount /dev/sdb1 /var/lib/ceph/osd/ceph-2 Then start the osd. #systemctl start [email protected]. We notice below services fail at the same time. === systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● var-lib-ceph-tmp-mnt.UiCYFu.mount not-found failed failed var-lib-ceph-tmp-mnt.UiCYFu.mount ● [email protected] loaded failed failed Ceph disk activation: /dev/sdc1 ● [email protected] loaded failed failed Ceph disk activation: /dev/sdd1 ● [email protected] loaded failed failed Ceph disk activation: /dev/sdd2 === Need your suggestion to proceed further Thanks Jayaram On Tue, Jun 13, 2017 at 7:30 PM, David Turner <[email protected]> wrote: > I came across this a few times. My problem was with journals I set up by > myself. I didn't give them the proper GUID partition type ID so the udev > rules didn't know how to make sure the partition looked correct. What the > udev rules were unable to do was chown the journal block device as > ceph:ceph so that it could be opened by the Ceph user. You can test by > chowning the journal block device and try to start the OSD again. > > Alternatively if you want to see more information, you can start the > daemon manually as opposed to starting it through systemd and see what its > output looks like. > > On Tue, Jun 13, 2017 at 6:32 AM nokia ceph <[email protected]> > wrote: > >> Hello, >> >> Some osd's not getting activated after a reboot operation which cause >> that particular osd's landing in failed state. >> >> Here you can see mount points were not getting updated to osd-num and >> mounted as a incorrect mount point, which caused osd.<num> can't able to >> mount/activate the osd's. >> >> Env:- RHEL 7.2 - EC 4+1, v11.2.0 bluestore. >> >> #grep mnt proc/mounts >> /dev/sdh1 /var/lib/ceph/tmp/mnt.om4Lbq xfs >> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota >> 0 0 >> /dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL xfs >> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota >> 0 0 >> >> From /var/log/messages.. >> >> -- >> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh2... >> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh1... >> >> >> May 26 15:39:58 cn1 systemd: *start request repeated too quickly for* >> [email protected] => suspecting this could be root cause. >> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation: >> /dev/sdh2. >> May 26 15:39:58 cn1 systemd: Unit [email protected] entered >> failed state. >> May 26 15:39:58 cn1 systemd: [email protected] failed. >> May 26 15:39:58 cn1 systemd: start request repeated too quickly for >> [email protected] >> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation: >> /dev/sdh1. >> May 26 15:39:58 cn1 systemd: Unit [email protected] entered >> failed state. >> May 26 15:39:58 cn1 systemd: [email protected] failed. >> -- >> >> But this issue will occur intermittently after a reboot operation. >> >> Note;- We haven't face this problem in Jewel. >> >> Awaiting for comments. >> >> Thanks >> Jayaram >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
