Re: [ceph-users] osd daemon cluster_fsid not reflecting actual cluster_fsid

Vincent Pharabot Tue, 18 Jun 2019 03:39:40 -0700

Thanks Eugen for answering

Yes it came from another cluster, trying to move all osd from one cluster
to another (1 to 1) so i would avoid wiping the disk
It's indeed a ceph-volume OSD, i checked the lvm label and it's correct


# lvs --noheadings --readonly --separator=";" -o lv_tags
ceph.block_device=/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955,ceph.block_
uuid=uL57Kk-9kcO-DdOY-Glwm-cg9P-atmx-3m033v,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=173b6382-504b-421f-aa4d-52526fa80dfa
,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=01dbf73f-3866-47be-b623-b9c539dcd955,ceph
.osd_id=0,ceph.type=block,ceph.vdo=0

OSD bluestore labels are also correct

# ceph-bluestore-tool show-label --dev
/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f
-3866-47be-b623-b9c539dcd955
{
"/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955":
{
"osd_uuid": "01dbf73f-3866-47be-b623-b9c539dcd955",
"size": 1073737629696,
"btime": "2019-06-17 15:28:53.126482",
"description": "main",
"bluefs": "1",
"ceph_fsid": "173b6382-504b-421f-aa4d-52526fa80dfa",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQBXwwddy5OEAxAAS4AidvOF0kl+kxIBvFhT1A==",
"ready": "ready",
"whoami": "0"
}
}


Anyway to change wrong fsid from OSD without zapping the disk ?

Thank you




Le mar. 18 juin 2019 à 12:19, Eugen Block <[email protected]> a écrit :

> Hi,
>
> this OSD must have been part of a previous cluster, I assume.
> I would remove it from crush if it's still there (check just to make
> sure), wipe the disk, remove any traces like logical volumes (if it
> was a ceph-volume lvm OSD) and if possible, reboot the node.
>
> Regards,
> Eugen
>
>
> Zitat von Vincent Pharabot <[email protected]>:
>
> > Hello
> >
> > I have an OSD which is stuck in booting state.
> > I find out that the daemon osd cluster_fsid is not the same that the
> actual
> > cluster fsid, which should explain why it does not join the cluster
> >
> > # ceph daemon osd.0 status
> > {
> > "cluster_fsid": "bb55e196-eedd-478d-99b6-1aad00b95f2a",
> > "osd_fsid": "01dbf73f-3866-47be-b623-b9c539dcd955",
> > "whoami": 0,
> > "state": "booting",
> > "oldest_map": 1,
> > "newest_map": 24,
> > "num_pgs": 200
> > }
> >
> > #ceph fsid
> > 173b6382-504b-421f-aa4d-52526fa80dfa
> >
> > I checked on the cluster fsid file and it's correct
> > # cat /var/lib/ceph/osd/ceph-0/ceph_fsid
> > 173b6382-504b-421f-aa4d-52526fa80dfa
> >
> > OSDMap shows correct fsid also
> >
> > # ceph osd dump
> > epoch 33
> > fsid 173b6382-504b-421f-aa4d-52526fa80dfa
> > created 2019-06-17 16:42:52.632757
> > modified 2019-06-18 09:28:10.376573
> > flags noout,sortbitwise,recovery_deletes,purged_snapdirs
> > crush_version 13
> > full_ratio 0.95
> > backfillfull_ratio 0.9
> > nearfull_ratio 0.85
> > require_min_compat_client jewel
> > min_compat_client jewel
> > require_osd_release mimic
> > pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0
> object_hash
> > rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
> > stripe_width 0 application cephfs
> > pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0
> > object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags
> hashpspool
> > stripe_width 0 application cephfs
> > max_osd 3
> > osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
> > [0,0) - - - - exists,new 01dbf73f-3866-47be-b623-b9c539dcd955
> > osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
> > [0,0) - - - - exists,new ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813
> > osd.2 down in weight 1 up_from 13 up_thru 23 down_at 26
> last_clean_interval
> > [0,0) 10.8.61.24:6800/4442 10.8.61.24:6801/4442 10.8.61.24:6802/4442
> > 10.8.61.24:6803/4442 exists e40ef3ba-8f19-4b41-be9d-f95f679df0eb
> >
> > So from where the daemon take the wrong cluster id ?
> > I might miss something obvious again...
> >
> > Someone able to help ?
> >
> > Thank you !
> > Vincent
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd daemon cluster_fsid not reflecting actual cluster_fsid

Reply via email to