I think i found where the wrong fsid is located on OSD osdmap but no way to change fsid... I tried with ceph-objectstore-tool --op set-osdmap from osdmap from monitor (ceph osd getmap) but no luck..... still with old fsid (no find a way to set the current epoch on osdmap)
Someone to give a hint ? My goal is to be able to duplicate a ceph cluster (with data) to make some tests... i would avoid taking the same fsid Thanks ! # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op get-osdmap --file /tmp/osdmapfromosd3 # osdmaptool /tmp/osdmapfromosd3 --print osdmaptool: osdmap file '/tmp/osdmapfromosd3' epoch 24 fsid bb55e196-eedd-478d-99b6-1aad00b95f2a created 2019-06-17 15:27:44.102409 modified 2019-06-17 15:53:37.279770 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 9 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client jewel min_compat_client jewel require_osd_release mimic pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool stripe_width 0 application cephfs pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool stripe_width 0 application cep hfs max_osd 3 osd.0 up in weight 1 up_from 23 up_thru 23 down_at 20 last_clean_interval [5,19) 10.8.12.170:6800/3613 10.8.12.170:6801/3613 10.8.12.170:6802/3613 10.8.12.170:6803/3613 e xists,up 01dbf73f-3866-47be-b623-b9c539dcd955 osd.1 up in weight 1 up_from 9 up_thru 23 down_at 0 last_clean_interval [0,0) 10.8.29.71:6800/4364 10.8.29.71:6801/4364 10.8.29.71:6802/4364 10.8.29.71:6803/4364 exists,u p ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813 osd.2 up in weight 1 up_from 13 up_thru 23 down_at 0 last_clean_interval [0,0) 10.8.32.182:6800/4361 10.8.32.182:6801/4361 10.8.32.182:6802/4361 10.8.32.182:6803/4361 exi sts,up 905d17fc-6f37-4404-bd5d-4adc231c49b3 Le mar. 18 juin 2019 à 12:38, Vincent Pharabot <[email protected]> a écrit : > Thanks Eugen for answering > > Yes it came from another cluster, trying to move all osd from one cluster > to another (1 to 1) so i would avoid wiping the disk > It's indeed a ceph-volume OSD, i checked the lvm label and it's correct > > # lvs --noheadings --readonly --separator=";" -o lv_tags > > ceph.block_device=/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955,ceph.block_ > > uuid=uL57Kk-9kcO-DdOY-Glwm-cg9P-atmx-3m033v,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=173b6382-504b-421f-aa4d-52526fa80dfa > > ,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=01dbf73f-3866-47be-b623-b9c539dcd955,ceph > .osd_id=0,ceph.type=block,ceph.vdo=0 > > OSD bluestore labels are also correct > > # ceph-bluestore-tool show-label --dev > /dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f > -3866-47be-b623-b9c539dcd955 > { > "/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955": > { > "osd_uuid": "01dbf73f-3866-47be-b623-b9c539dcd955", > "size": 1073737629696, > "btime": "2019-06-17 15:28:53.126482", > "description": "main", > "bluefs": "1", > "ceph_fsid": "173b6382-504b-421f-aa4d-52526fa80dfa", > "kv_backend": "rocksdb", > "magic": "ceph osd volume v026", > "mkfs_done": "yes", > "osd_key": "AQBXwwddy5OEAxAAS4AidvOF0kl+kxIBvFhT1A==", > "ready": "ready", > "whoami": "0" > } > } > > > Anyway to change wrong fsid from OSD without zapping the disk ? > > Thank you > > > > > Le mar. 18 juin 2019 à 12:19, Eugen Block <[email protected]> a écrit : > >> Hi, >> >> this OSD must have been part of a previous cluster, I assume. >> I would remove it from crush if it's still there (check just to make >> sure), wipe the disk, remove any traces like logical volumes (if it >> was a ceph-volume lvm OSD) and if possible, reboot the node. >> >> Regards, >> Eugen >> >> >> Zitat von Vincent Pharabot <[email protected]>: >> >> > Hello >> > >> > I have an OSD which is stuck in booting state. >> > I find out that the daemon osd cluster_fsid is not the same that the >> actual >> > cluster fsid, which should explain why it does not join the cluster >> > >> > # ceph daemon osd.0 status >> > { >> > "cluster_fsid": "bb55e196-eedd-478d-99b6-1aad00b95f2a", >> > "osd_fsid": "01dbf73f-3866-47be-b623-b9c539dcd955", >> > "whoami": 0, >> > "state": "booting", >> > "oldest_map": 1, >> > "newest_map": 24, >> > "num_pgs": 200 >> > } >> > >> > #ceph fsid >> > 173b6382-504b-421f-aa4d-52526fa80dfa >> > >> > I checked on the cluster fsid file and it's correct >> > # cat /var/lib/ceph/osd/ceph-0/ceph_fsid >> > 173b6382-504b-421f-aa4d-52526fa80dfa >> > >> > OSDMap shows correct fsid also >> > >> > # ceph osd dump >> > epoch 33 >> > fsid 173b6382-504b-421f-aa4d-52526fa80dfa >> > created 2019-06-17 16:42:52.632757 >> > modified 2019-06-18 09:28:10.376573 >> > flags noout,sortbitwise,recovery_deletes,purged_snapdirs >> > crush_version 13 >> > full_ratio 0.95 >> > backfillfull_ratio 0.9 >> > nearfull_ratio 0.85 >> > require_min_compat_client jewel >> > min_compat_client jewel >> > require_osd_release mimic >> > pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 >> object_hash >> > rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool >> > stripe_width 0 application cephfs >> > pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0 >> > object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags >> hashpspool >> > stripe_width 0 application cephfs >> > max_osd 3 >> > osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval >> > [0,0) - - - - exists,new 01dbf73f-3866-47be-b623-b9c539dcd955 >> > osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval >> > [0,0) - - - - exists,new ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813 >> > osd.2 down in weight 1 up_from 13 up_thru 23 down_at 26 >> last_clean_interval >> > [0,0) 10.8.61.24:6800/4442 10.8.61.24:6801/4442 10.8.61.24:6802/4442 >> > 10.8.61.24:6803/4442 exists e40ef3ba-8f19-4b41-be9d-f95f679df0eb >> > >> > So from where the daemon take the wrong cluster id ? >> > I might miss something obvious again... >> > >> > Someone able to help ? >> > >> > Thank you ! >> > Vincent >> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
