Hi,

I assume you see the duplicate OSD in 'ceph orch ps | grep -w osd.1' as well? Are they both supposed to run on the same host? You might have an orphaned daemon there, check 'cephadm ls --no-detail' on the host (probably noc3), maybe there's one "legacy" osd.1? If that is the case, remove it with 'cephadm rm-daemon --name osd.1 --fsid {FSID}', but be careful to not remove the intact OSD! You can paste the above information first before you purge anything.

Regards,
Eugen

Zitat von Harry G Coin <hgc...@gmail.com>:

Need a clue about what appears to be a phantom duplicate osd automagically created/discovered via the upgrade process -- which blocks the upgrade.

The upgrade process on a known-good 19.2.2 to 19.2.3 proceeded normally through the mgrs and mons.  It upgraded most of the osds, then stopped with the complaint "Error: UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.1 on host noc3 failed."    The roster in the "Daemon Versions" table on the dashboard looks normal except:

There are two entries for 'osd.1'  One of them has the correct version number, 19.2.2, the other is blank.

The upgrade appears 'stuck'.  An attempt to 'resume' resulted in the same error.  The cluster operations are normal with all osds up and in.  The cluster is ipv6.  Oddly ceph -s reports:


root@noc1:~# ceph -s
  cluster:
    id:     406xxxxxxx0f8
    health: HEALTH_WARN
            Public/cluster network defined, but can not be found on any host
            Upgrading daemon osd.1 on host noc3 failed.

  services:
    mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 39m)
    mgr: noc2.yhyuxd(active, since 4h), standbys: noc3.sybsfb, noc4.tvhgac, noc1.jtteqg
    mds: 1/1 daemons up, 3 standby
    osd: 27 osds: 27 up (since 3h), 27 in (since 10d)

  data:
    volumes: 1/1 healthy
    pools:   16 pools, 1809 pgs
    objects: 14.77M objects, 20 TiB
    usage:   52 TiB used, 58 TiB / 111 TiB avail
    pgs:     1808 active+clean
             1    active+clean+scrubbing

  io:
    client:   835 KiB/s rd, 1.0 MiB/s wr, 24 op/s rd, 105 op/s wr

  progress:
    Upgrade to 19.2.3 (4h)
      [============................] (remaining: 4h)

Related log entry:

29/7/25 02:40 PM[ERR]cephadm exited with an error code: 1, stderr: Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-4067126d-01cb-40af-824a-881c130140f8-osd-1 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 Reconfig daemon osd.1 ... Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5581, in File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5569, in main File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3051, in command_deploy_from File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3086, in _common_deploy File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3106, in _deploy_daemon_container File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 1077, in deploy_daemon File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 765, in create_daemon_dirs File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__ next(self.gen) File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/cephadmlib/file_utils.py", line 52, in write_new IsADirectoryError: [Errno 21] Is a directory: '/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config.new' -> '/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config' Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1145, in _check_daemons self.mgr._daemon_action(daemon_spec, action=action) File "/usr/share/ceph/mgr/cephadm/module.py", line 2545, in _daemon_action return self.wait_async( File "/usr/share/ceph/mgr/cephadm/module.py", line 815, in wait_async return self.event_loop.get_result(coro, timeout) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 136, in get_result return future.result(timeout) File "/lib64/python3.9/concurrent/futures/_base.py", line 446, in result return self.__get_result() File "/lib64/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/share/ceph/mgr/cephadm/serve.py", line 1381, in _create_daemon out, err, code = await self._run_cephadm( File "/usr/share/ceph/mgr/cephadm/serve.py", line 1724, in _run_cephadm raise OrchestratorError( orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr: Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 Reconfig daemon osd.1 ... Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5581, in File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 5569, in main File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3051, in command_deploy_from File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3086, in _common_deploy File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 3106, in _deploy_daemon_container File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 1077, in deploy_daemon File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py", line 765, in create_daemon_dirs File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__ next(self.gen) File "/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/cephadmlib/file_utils.py", line 52, in write_new IsADirectoryError: [Errno 21] Is a directory: '/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config.new' -> '/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config'
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to