Re: [ceph-users] osd daemon cluster_fsid not reflecting actual cluster_fsid

Vincent Pharabot Tue, 18 Jun 2019 09:47:38 -0700

I think i found where the wrong fsid is located on OSD osdmap but no way to
change fsid...
I tried with ceph-objectstore-tool --op set-osdmap from osdmap from monitor
(ceph osd getmap) but no luck..... still with old fsid (no find a way to
set the current epoch on osdmap)


Someone to give a hint ?

My goal is to be able to duplicate a ceph cluster (with data) to make some
tests... i would avoid taking the same fsid

Thanks !

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op
get-osdmap --file /tmp/osdmapfromosd3

# osdmaptool /tmp/osdmapfromosd3 --print
osdmaptool: osdmap file '/tmp/osdmapfromosd3'
epoch 24
fsid bb55e196-eedd-478d-99b6-1aad00b95f2a
created 2019-06-17 15:27:44.102409
modified 2019-06-17 15:53:37.279770
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 9
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release mimic

pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
stripe_width 0 application cep
hfs

max_osd 3
osd.0 up in weight 1 up_from 23 up_thru 23 down_at 20 last_clean_interval
[5,19) 10.8.12.170:6800/3613 10.8.12.170:6801/3613 10.8.12.170:6802/3613
10.8.12.170:6803/3613 e
xists,up 01dbf73f-3866-47be-b623-b9c539dcd955
osd.1 up in weight 1 up_from 9 up_thru 23 down_at 0 last_clean_interval
[0,0) 10.8.29.71:6800/4364 10.8.29.71:6801/4364 10.8.29.71:6802/4364
10.8.29.71:6803/4364 exists,u
p ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813
osd.2 up in weight 1 up_from 13 up_thru 23 down_at 0 last_clean_interval
[0,0) 10.8.32.182:6800/4361 10.8.32.182:6801/4361 10.8.32.182:6802/4361
10.8.32.182:6803/4361 exi
sts,up 905d17fc-6f37-4404-bd5d-4adc231c49b3


Le mar. 18 juin 2019 à 12:38, Vincent Pharabot <[email protected]>
a écrit :

> Thanks Eugen for answering
>
> Yes it came from another cluster, trying to move all osd from one cluster
> to another (1 to 1) so i would avoid wiping the disk
> It's indeed a ceph-volume OSD, i checked the lvm label and it's correct
>
> # lvs --noheadings --readonly --separator=";" -o lv_tags
>
> ceph.block_device=/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955,ceph.block_
>
> uuid=uL57Kk-9kcO-DdOY-Glwm-cg9P-atmx-3m033v,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=173b6382-504b-421f-aa4d-52526fa80dfa
>
> ,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=01dbf73f-3866-47be-b623-b9c539dcd955,ceph
> .osd_id=0,ceph.type=block,ceph.vdo=0
>
> OSD bluestore labels are also correct
>
> # ceph-bluestore-tool show-label --dev
> /dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f
> -3866-47be-b623-b9c539dcd955
> {
> "/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955":
> {
> "osd_uuid": "01dbf73f-3866-47be-b623-b9c539dcd955",
> "size": 1073737629696,
> "btime": "2019-06-17 15:28:53.126482",
> "description": "main",
> "bluefs": "1",
> "ceph_fsid": "173b6382-504b-421f-aa4d-52526fa80dfa",
> "kv_backend": "rocksdb",
> "magic": "ceph osd volume v026",
> "mkfs_done": "yes",
> "osd_key": "AQBXwwddy5OEAxAAS4AidvOF0kl+kxIBvFhT1A==",
> "ready": "ready",
> "whoami": "0"
> }
> }
>
>
> Anyway to change wrong fsid from OSD without zapping the disk ?
>
> Thank you
>
>
>
>
> Le mar. 18 juin 2019 à 12:19, Eugen Block <[email protected]> a écrit :
>
>> Hi,
>>
>> this OSD must have been part of a previous cluster, I assume.
>> I would remove it from crush if it's still there (check just to make
>> sure), wipe the disk, remove any traces like logical volumes (if it
>> was a ceph-volume lvm OSD) and if possible, reboot the node.
>>
>> Regards,
>> Eugen
>>
>>
>> Zitat von Vincent Pharabot <[email protected]>:
>>
>> > Hello
>> >
>> > I have an OSD which is stuck in booting state.
>> > I find out that the daemon osd cluster_fsid is not the same that the
>> actual
>> > cluster fsid, which should explain why it does not join the cluster
>> >
>> > # ceph daemon osd.0 status
>> > {
>> > "cluster_fsid": "bb55e196-eedd-478d-99b6-1aad00b95f2a",
>> > "osd_fsid": "01dbf73f-3866-47be-b623-b9c539dcd955",
>> > "whoami": 0,
>> > "state": "booting",
>> > "oldest_map": 1,
>> > "newest_map": 24,
>> > "num_pgs": 200
>> > }
>> >
>> > #ceph fsid
>> > 173b6382-504b-421f-aa4d-52526fa80dfa
>> >
>> > I checked on the cluster fsid file and it's correct
>> > # cat /var/lib/ceph/osd/ceph-0/ceph_fsid
>> > 173b6382-504b-421f-aa4d-52526fa80dfa
>> >
>> > OSDMap shows correct fsid also
>> >
>> > # ceph osd dump
>> > epoch 33
>> > fsid 173b6382-504b-421f-aa4d-52526fa80dfa
>> > created 2019-06-17 16:42:52.632757
>> > modified 2019-06-18 09:28:10.376573
>> > flags noout,sortbitwise,recovery_deletes,purged_snapdirs
>> > crush_version 13
>> > full_ratio 0.95
>> > backfillfull_ratio 0.9
>> > nearfull_ratio 0.85
>> > require_min_compat_client jewel
>> > min_compat_client jewel
>> > require_osd_release mimic
>> > pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0
>> object_hash
>> > rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
>> > stripe_width 0 application cephfs
>> > pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0
>> > object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags
>> hashpspool
>> > stripe_width 0 application cephfs
>> > max_osd 3
>> > osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
>> > [0,0) - - - - exists,new 01dbf73f-3866-47be-b623-b9c539dcd955
>> > osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
>> > [0,0) - - - - exists,new ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813
>> > osd.2 down in weight 1 up_from 13 up_thru 23 down_at 26
>> last_clean_interval
>> > [0,0) 10.8.61.24:6800/4442 10.8.61.24:6801/4442 10.8.61.24:6802/4442
>> > 10.8.61.24:6803/4442 exists e40ef3ba-8f19-4b41-be9d-f95f679df0eb
>> >
>> > So from where the daemon take the wrong cluster id ?
>> > I might miss something obvious again...
>> >
>> > Someone able to help ?
>> >
>> > Thank you !
>> > Vincent
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd daemon cluster_fsid not reflecting actual cluster_fsid

Reply via email to