Hello.
We're rebuilding our OSD nodes.
Once cluster worked without any issues, this one is being stubborn
I attempted to add one back to the cluster and seeing the error below
in out logs:
cephadm ['--image',
'registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160', 'pull']
2024-03-27 19:30:53,901 7f49792ed740 DEBUG /bin/podman: 4.6.1
2024-03-27 19:30:53,905 7f49792ed740 INFO Pulling container image
registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,045 7f49792ed740 DEBUG /bin/podman: Trying to pull
registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,266 7f49792ed740 DEBUG /bin/podman: Error:
initializing source
docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
manifest unknown
2024-03-27 19:30:54,270 7f49792ed740 INFO Non-zero exit code 125 from
/bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Trying
to pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Error:
initializing source
docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
manifest unknown
2024-03-27 19:30:54,270 7f49792ed740 ERROR ERROR: Failed command:
/bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
$ ceph versions
{
"mon": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 2
},
"mgr": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
},
"mds": {},
"rgw": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 3
},
"overall": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160,
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 5,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 4
}
}
I don't understand why it's trying to pull 16.2.10-160 which doesn't exist.
registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8 5 93b3137e7a65 11
months ago 696 MB
registry.redhat.io/rhceph/rhceph-5-rhel8 5-416 838cea16e15c 11 months
ago 1.02 GB
registry.redhat.io/openshift4/ose-prometheus v4.6 ec2d358ca73c 17
months ago 397 MB
This happens using cepadm-ansible as well as
$ ceph orch ls --export --service_name xxx > xxx.yml
$ sudo ceph orch apply -i xxx.yml
I tried ceph orch daemon add osd host:/dev/sda
which surprisingly created a volume on host:/dev/sda and created an
osd i can see in
$ ceph osd tree
but It did not get added to host I suspect because of the same Podman
error and now I'm unable remove it.
$ ceph orch osd rm
does not work even with the --force flag.
I stopped the removal with
$ ceph orch osd rm stop
after 10+ minutes
I'm considering running $ ceph osd purge osd# --force but worried it
may only make things worse.
ceph -s shows that osd but not up or in.
Thanks, and looking forward to any advice!
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]