Please don't drop the list from your responses, you'd benefit from more people reading it.

The cephadm ls output isn't really helpful, you need to figure out why docker doesn't start. Either syslog, journald or dmesg or whatever should give some clue. And to me it sounds like there has been more going on beside the network outage, maybe some leftovers from previous deployments or tests or something else that "confuses" docker? Maybe you have both podman and docker installed and since cephadm prefers podman, the containers fail to start?

Zitat von Jacek Rużyczka <[email protected]>:

I can't run the Ceph container on the master node (blade3n1) anymore. It's
not executed anymore without an error message. Here is what cephadm ls says:

mixtile@blade3n1:~$ sudo cephadm ls
[
   {
       "style": "cephadm:v1",
       "name": "mon.blade3n1",
       "fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
       "systemd_unit":
"[email protected]"
,
       "enabled": true,
       "state": "error",
       "service_name": "mon",
       "memory_request": null,
       "memory_limit": null,
       "ports": [],
       "container_id": null,
       "container_image_name": "quay.io/ceph/ceph:v19",
       "container_image_id": null,
       "container_image_digests": null,
       "version": null,
       "started": null,
       "created": "2026-04-16T14:35:47.634066Z",
       "deployed": "2026-04-16T14:35:45.414037Z",
       "configured": "2026-04-20T17:16:32.722329Z"
   },
   {
       "style": "cephadm:v1",
       "name": "node-exporter.blade3n1",
       "fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
       "systemd_unit":
"ceph-8aad3073-39a1-11f1-bf6e-f2704a1efa9b@node-exporter
.blade3n1",
       "enabled": true,
       "state": "error",
       "service_name": "node-exporter",
       "ports": [
           9100
       ],
       "ip": null,
       "deployed_by": [
           "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
       ],
       "rank": null,
       "rank_generation": null,
       "extra_container_args": null,
       "extra_entrypoint_args": null,
       "memory_request": null,
       "memory_limit": null,
       "container_id": null,
       "container_image_name": "quay.io/prometheus/node-exporter:v1.7.0",
       "container_image_id": null,
       "container_image_digests": null,
       "version": null,
       "started": null,
       "created": "2026-04-21T12:55:39.731035Z",
       "deployed": "2026-04-21T12:55:38.217675Z",
       "configured": "2026-04-21T12:55:39.734369Z"
   },
   {
       "style": "cephadm:v1",
       "name": "ceph-exporter.blade3n1",
       "fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
       "systemd_unit":
"ceph-8aad3073-39a1-11f1-bf6e-f2704a1efa9b@ceph-exporter
.blade3n1",
       "enabled": true,
       "state": "error",
       "service_name": "ceph-exporter",
       "ports": [],
       "ip": null,
       "deployed_by": [
           "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
       ],
       "rank": null,
       "rank_generation": null,
       "extra_container_args": null,
       "extra_entrypoint_args": null,
       "memory_request": null,
       "memory_limit": null,
       "container_id": null,
       "container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
       "container_image_id": null,
       "container_image_digests": null,
       "version": null,
       "started": null,
       "created": "2026-04-16T14:37:32.218782Z",
       "deployed": "2026-04-16T14:37:30.612094Z",
       "configured": "2026-04-20T17:16:36.139048Z"
   },
   {
       "style": "cephadm:v1",
       "name": "mgr.blade3n1.rrlwwv",
       "fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
       "systemd_unit":
"[email protected].
rrlwwv",
       "enabled": true,
       "state": "error",
       "service_name": "mgr",
       "memory_request": null,
       "memory_limit": null,
       "ports": [
           9283,
           8765,
           8443
       ],
       "container_id": null,
       "container_image_name": "quay.io/ceph/ceph:v19",
       "container_image_id": null,
       "container_image_digests": null,
       "version": null,
       "started": null,
       "created": "2026-04-16T14:35:54.054151Z",
       "deployed": "2026-04-16T14:35:52.430796Z",
       "configured": "2026-04-20T17:16:37.612403Z"
   },
   {
       "style": "cephadm:v1",
       "name": "crash.blade3n1",
       "fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
       "systemd_unit":
"[email protected]
1",
       "enabled": true,
       "state": "error",
       "service_name": "crash",
       "ports": [],
       "ip": null,
       "deployed_by": [
           "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
       ],
       "rank": null,
       "rank_generation": null,
       "extra_container_args": null,
       "extra_entrypoint_args": null,
       "memory_request": null,
       "memory_limit": null,
       "container_id": null,
       "container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
       "container_image_id": null,
       "container_image_digests": null,
       "version": null,
       "started": null,
       "created": "2026-04-16T14:37:36.855510Z",
       "deployed": "2026-04-16T14:37:35.268822Z",
       "configured": "2026-04-20T17:16:39.025758Z"
   },
   {
       "style": "cephadm:v1",
       "name": "osd.3",
       "fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
       "systemd_unit": "[email protected]",
       "enabled": true,
       "state": "error",
       "service_name": "osd",
       "ports": [],
       "ip": null,
       "deployed_by": [
           "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
       ],
       "rank": null,
       "rank_generation": null,
       "extra_container_args": null,
       "extra_entrypoint_args": null,
       "memory_request": null,
       "memory_limit": null,
       "container_id": null,
       "container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
       "container_image_id": null,
       "container_image_digests": null,
       "version": null,
       "started": null,
       "created": "2026-04-23T15:05:00.686688Z",
       "deployed": "2026-04-23T15:04:59.176667Z",
       "configured": "2026-04-23T15:05:00.693355Z"
   },
   {
       "style": "cephadm:v1",
       "name": "mds.data.blade3n1.eczeqc",
       "fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
       "systemd_unit":
"[email protected]
e3n1.eczeqc",
       "enabled": true,
       "state": "error",
       "service_name": "mds.data",
       "ports": [],
       "ip": null,
       "deployed_by": [
           "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
       ],
       "rank": null,
       "rank_generation": null,
       "extra_container_args": null,
       "extra_entrypoint_args": null,
       "memory_request": null,
       "memory_limit": null,
       "container_id": null,
       "container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
       "container_image_id": null,
       "container_image_digests": null,
       "version": null,
       "started": null,
       "created": "2026-04-16T15:54:13.264224Z",
       "deployed": "2026-04-16T15:54:10.870858Z",
       "configured": "2026-04-20T17:16:40.499113Z"
   }
]


Am Mi., 27. Mai 2026 um 15:07 Uhr schrieb Jacek Rużyczka <
[email protected]>:

Hi Eugen,

You might need to run 'systemctl reset-failed...' to let systemd start the
containers.


I've already done that. No use. Even worse: On node #1, Docker no longer
starts. When trying to restart the daemon, I get errors like this:

docker.service: Failed with result 'core-dump'.

But before you do that, do you have MON logs with an explanation why they
refuse to start?


Unfortunately no, not even in the syslog. In the meantime, I was able to
start another MON via Cephadm (because the Docker instance had even deleted
the image), but now I've got the problem with the one node, where Docker
refuses to start.

Regarding Ceph images, your cluster uses af0c5903e901 for the Ceph
services, what does 'docker images | grep af0c5903e901' show?


On the affected node, nothing 'cause the Docker daemon wouldn't even start.

I have the impression that this is a "regular" cephadm cluster


True

BTW, when running the test script supplied by the Docker guys
https://docs.docker.com/engine/daemon/troubleshoot/, I get some warnings:

- Network Drivers:
 - "bridge":
   - sysctl net.ipv4.ip_forward: disabled
   - sysctl net.ipv6.conf.all.forwarding: disabled
   - sysctl net.ipv6.conf.default.forwarding: disabled

On nodes #2 thru #4, net.ipv4.ip_forward is enabled.

Regards
Jacek Rużyczka



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to