Note, I am not certain that this follows the same for bluestore. I haven't set up bluestore osds yet.
On Wed, Jun 14, 2017 at 11:48 AM David Turner <[email protected]> wrote: > tl;dr to get a ceph journal to work with udev rules run this command > substituting your device name (/dev/sdb) and partition number used twice > (=4). > sgdisk /dev/sdb -t=4:45B0969E-9B03-4F30-B4C6-5EC00CEFF106 -c=4:'ceph > journal' > And this for your osds replacing the device name (/dev/sdc) and partition > number used twice (=1). > sgdisk /dev/sdc -t=1:4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D -c=1:'ceph data' > > At the bottom is the relevant information from a printout of sgdisk for > journal and osd partitions that are configured properly to have the udev > rules manage the devices properly. > > The "Partition unique GUID" is different for every disk. Generally this > is auto-assigned, but you can specify it if you'd like. A time when it is > really nice to specify is if you are swapping out a failing journal > device. You can recreate the new partitions with the same GUIDS and if you > set up your /var/lib/ceph/osd/ceph-##/journal symlink to use that guid (ie > /dev/disk/by-partuuid/053f386f-223c-43a2-9843-6462dfb3857a for my /dev/sdb4 > journal partition) then you won't need to make any changes on the OSD when > you swap it out (other than the commands to flush and create the journal). > > The "Partition name" is just nice to have to easily be able to tell what a > disk was used for if you throw it on a stack one day. It isn't necessary, > but nice. 'ceph journal' and 'ceph data' are the defaults that the > ceph-deploy script will do, so it's also nice to match those anyway. > > The real key to getting the udev rules to work is the "Partition GUID > code". A ceph journal should always have > 45B0969E-9B03-4F30-B4C6-5EC00CEFF106 as its "Partition GUID code" and a > ceph osd should always have 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D. That is > hardcoded into the udev rules that ceph installs and has not changed ever > that I'm aware of. > > # sgdisk /dev/sdb -i=4 > Partition GUID code: 45B0969E-9B03-4F30-B4C6-5EC00CEFF106 (Unknown) > Partition unique GUID: 053F386F-223C-43A2-9843-6462DFB3857A > Partition name: 'ceph journal' > # sgdisk /dev/sdb -i=5 > Partition GUID code: 45B0969E-9B03-4F30-B4C6-5EC00CEFF106 (Unknown) > Partition unique GUID: 6448D160-A2B7-4EF9-BC67-12F4D49D9FDD > Attribute flags: 0000000000000000 > Partition name: 'ceph journal' > > root@ceph1:~# sgdisk /dev/sdc -i=1 > Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown) > Partition unique GUID: 036FCF9D-2865-4822-9F93-94C3B5750DDC > Partition name: 'ceph data' > root@ceph1:~# sgdisk /dev/sdd -i=1 > Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown) > Partition unique GUID: D74AB67E-C7F6-4974-889F-ABEBA7F2DC2F > Partition name: 'ceph data' > > On Wed, Jun 14, 2017 at 4:02 AM nokia ceph <[email protected]> > wrote: > >> Hello David, >> >> Thanks for the update. >> >> http://tracker.ceph.com/issues/13833#note-7 - As per this tracker they >> mentioned that the GUID may differ which cause udev were unable to chown >> ceph. >> >> We are following below procedure to create OSD's >> >> #sgdisk -Z /dev/sdb >> #ceph-disk prepare --bluestore --cluster ceph --cluster-uuid <fsid> >> /dev/vdb >> #ceph-disk --verbose activate /dev/vdb1 >> >> Here you can see all the device haiving same GUID. >> >> #for i in b c d ; do /usr/sbin/blkid -o udev -p /dev/vd$i\1 | grep >> ID_PART_ENTRY_TYPE; done >> ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d >> ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d >> ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d >> >> >> Currently we are facing issue with the OSD activation while boot. which >> caused the OSD journal device mounted like this.. >> >> ~~~ >> /dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL >> ~~~ >> >> At the same time on OSD logs, we getting like, osd.2 can't able to find >> the mounted journal device hence it landed into failure state.. >> >> ~~~ >> May 26 15:40:39 cn1 ceph-osd: 2017-05-26 15:40:39.978072 7f1dc3bc2940 -1 >> #033[0;31m ** ERROR: unable to open OSD superblock on >> /var/lib/ceph/osd/ceph-2: (2) No such file or directory#033[0m >> May 26 15:40:39 cn1 systemd: [email protected]: main process exited, >> code=exited, status=1/FAILURE >> May 26 15:40:39 cn1 systemd: Unit [email protected] entered failed >> state. >> May 26 15:40:39 cn1 systemd: [email protected] failed. >> ~~~ >> >> To fix this problem, we are following below workaround... >> >> #umount /var/lib/ceph/tmp/mnt.om4Lbq >> >> Mount the device with respective osd number. >> #mount /dev/sdb1 /var/lib/ceph/osd/ceph-2 >> >> Then start the osd. >> #systemctl start [email protected]. >> >> We notice below services fail at the same time. >> >> === >> systemctl --failed >> UNIT LOAD ACTIVE SUB DESCRIPTION >> ● var-lib-ceph-tmp-mnt.UiCYFu.mount not-found failed failed >> var-lib-ceph-tmp-mnt.UiCYFu.mount >> ● [email protected] loaded failed failed Ceph disk >> activation: /dev/sdc1 >> ● [email protected] loaded failed failed Ceph disk >> activation: /dev/sdd1 >> ● [email protected] loaded failed failed Ceph disk >> activation: /dev/sdd2 >> === >> >> Need your suggestion to proceed further >> >> Thanks >> Jayaram >> >> On Tue, Jun 13, 2017 at 7:30 PM, David Turner <[email protected]> >> wrote: >> >>> I came across this a few times. My problem was with journals I set up >>> by myself. I didn't give them the proper GUID partition type ID so the >>> udev rules didn't know how to make sure the partition looked correct. What >>> the udev rules were unable to do was chown the journal block device as >>> ceph:ceph so that it could be opened by the Ceph user. You can test by >>> chowning the journal block device and try to start the OSD again. >>> >>> Alternatively if you want to see more information, you can start the >>> daemon manually as opposed to starting it through systemd and see what its >>> output looks like. >>> >>> On Tue, Jun 13, 2017 at 6:32 AM nokia ceph <[email protected]> >>> wrote: >>> >>>> Hello, >>>> >>>> Some osd's not getting activated after a reboot operation which cause >>>> that particular osd's landing in failed state. >>>> >>>> Here you can see mount points were not getting updated to osd-num and >>>> mounted as a incorrect mount point, which caused osd.<num> can't able to >>>> mount/activate the osd's. >>>> >>>> Env:- RHEL 7.2 - EC 4+1, v11.2.0 bluestore. >>>> >>>> #grep mnt proc/mounts >>>> /dev/sdh1 /var/lib/ceph/tmp/mnt.om4Lbq xfs >>>> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota 0 0 >>>> /dev/sdh1 /var/lib/ceph/tmp/mnt.EayTmL xfs >>>> rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota 0 0 >>>> >>>> From /var/log/messages.. >>>> >>>> -- >>>> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh2... >>>> May 26 15:39:58 cn1 systemd: Starting Ceph disk activation: /dev/sdh1... >>>> >>>> >>>> May 26 15:39:58 cn1 systemd: *start request repeated too quickly for* >>>> [email protected] => suspecting this could be root cause. >>>> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation: >>>> /dev/sdh2. >>>> May 26 15:39:58 cn1 systemd: Unit [email protected] entered >>>> failed state. >>>> May 26 15:39:58 cn1 systemd: [email protected] failed. >>>> May 26 15:39:58 cn1 systemd: start request repeated too quickly for >>>> [email protected] >>>> May 26 15:39:58 cn1 systemd: Failed to start Ceph disk activation: >>>> /dev/sdh1. >>>> May 26 15:39:58 cn1 systemd: Unit [email protected] entered >>>> failed state. >>>> May 26 15:39:58 cn1 systemd: [email protected] failed. >>>> -- >>>> >>>> But this issue will occur intermittently after a reboot operation. >>>> >>>> Note;- We haven't face this problem in Jewel. >>>> >>>> Awaiting for comments. >>>> >>>> Thanks >>>> Jayaram >>>> _______________________________________________ >>>> ceph-users mailing list >>>> [email protected] >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
