Hi,
I assume you see the duplicate OSD in 'ceph orch ps | grep -w osd.1'
as well? Are they both supposed to run on the same host?
You might have an orphaned daemon there, check 'cephadm ls
--no-detail' on the host (probably noc3), maybe there's one "legacy"
osd.1? If that is the case, remove it with 'cephadm rm-daemon --name
osd.1 --fsid {FSID}', but be careful to not remove the intact OSD! You
can paste the above information first before you purge anything.
Regards,
Eugen
Zitat von Harry G Coin <hgc...@gmail.com>:
Need a clue about what appears to be a phantom duplicate osd
automagically created/discovered via the upgrade process -- which
blocks the upgrade.
The upgrade process on a known-good 19.2.2 to 19.2.3 proceeded
normally through the mgrs and mons. It upgraded most of the osds,
then stopped with the complaint "Error: UPGRADE_REDEPLOY_DAEMON:
Upgrading daemon osd.1 on host noc3 failed." The roster in the
"Daemon Versions" table on the dashboard looks normal except:
There are two entries for 'osd.1' One of them has the correct
version number, 19.2.2, the other is blank.
The upgrade appears 'stuck'. An attempt to 'resume' resulted in the
same error. The cluster operations are normal with all osds up and
in. The cluster is ipv6. Oddly ceph -s reports:
root@noc1:~# ceph -s
cluster:
id: 406xxxxxxx0f8
health: HEALTH_WARN
Public/cluster network defined, but can not be found on any host
Upgrading daemon osd.1 on host noc3 failed.
services:
mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 39m)
mgr: noc2.yhyuxd(active, since 4h), standbys: noc3.sybsfb,
noc4.tvhgac, noc1.jtteqg
mds: 1/1 daemons up, 3 standby
osd: 27 osds: 27 up (since 3h), 27 in (since 10d)
data:
volumes: 1/1 healthy
pools: 16 pools, 1809 pgs
objects: 14.77M objects, 20 TiB
usage: 52 TiB used, 58 TiB / 111 TiB avail
pgs: 1808 active+clean
1 active+clean+scrubbing
io:
client: 835 KiB/s rd, 1.0 MiB/s wr, 24 op/s rd, 105 op/s wr
progress:
Upgrade to 19.2.3 (4h)
[============................] (remaining: 4h)
Related log entry:
29/7/25 02:40 PM[ERR]cephadm exited with an error code: 1, stderr:
Non-zero exit code 1 from /usr/bin/docker container inspect --format
{{.State.Status}} ceph-4067126d-01cb-40af-824a-881c130140f8-osd-1
/usr/bin/docker: stdout /usr/bin/docker: stderr Error response from
daemon: No such container:
ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 Non-zero exit code 1 from
/usr/bin/docker container inspect --format {{.State.Status}}
ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 /usr/bin/docker: stdout
/usr/bin/docker: stderr Error response from daemon: No such
container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 Reconfig
daemon osd.1 ... Traceback (most recent call last): File "", line
198, in _run_module_as_main File "", line 88, in _run_code File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5581, in File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5569, in main File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3051, in command_deploy_from File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3086, in _common_deploy File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3106, in _deploy_daemon_container File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 1077, in deploy_daemon File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 765, in create_daemon_dirs File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__ next(self.gen) File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/cephadmlib/file_utils.py", line 52, in write_new IsADirectoryError: [Errno 21] Is a directory: '/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config.new' -> '/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config' Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1145, in _check_daemons self.mgr._daemon_action(daemon_spec, action=action) File "/usr/share/ceph/mgr/cephadm/module.py", line 2545, in _daemon_action return self.wait_async( File "/usr/share/ceph/mgr/cephadm/module.py", line 815, in wait_async return self.event_loop.get_result(coro, timeout) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 136, in get_result return future.result(timeout) File "/lib64/python3.9/concurrent/futures/_base.py", line 446, in result return self.__get_result() File "/lib64/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/share/ceph/mgr/cephadm/serve.py", line 1381, in _create_daemon out, err, code = await self._run_cephadm( File "/usr/share/ceph/mgr/cephadm/serve.py", line 1724, in _run_cephadm raise OrchestratorError( orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr: Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 Reconfig daemon osd.1 ... Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5581, in File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5569, in main File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3051, in command_deploy_from File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3086, in _common_deploy File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3106, in _deploy_daemon_container File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 1077, in deploy_daemon File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 765, in create_daemon_dirs File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__ next(self.gen) File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/cephadmlib/file_utils.py", line 52, in write_new IsADirectoryError: [Errno 21] Is a directory: '/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config.new' ->
'/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config'
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io