[ceph-users] Re: Problem with Ceph daemons

Adam King Mon, 21 Feb 2022 05:17:34 -0800

I'd say you probably don't need both services. It looks like they're
configured to listen to the same port(80 from the output) and are being
placed on the same hosts (c01-c06). It could be that port conflict that is
causing the rgw daemons to go into error state. Cephadm will try to put 2
down on each of these hosts to satisfy both rgw services specified but if
they both try to use the same port whichever one gets placed second could
go into error state for that reason.


 - Adam King

On Fri, Feb 18, 2022 at 1:38 PM Ron Gage <[email protected]> wrote:

> All:
>
> I think I found the problem - hence...
>
> [root@c01 ceph]# ceph orch ls
> NAME                       PORTS        RUNNING  REFRESHED  AGE  PLACEMENT
> alertmanager               ?:9093,9094      1/1  2m ago     9d   count:1
> crash                                       6/6  2m ago     9d   *
> grafana                    ?:3000           1/1  2m ago     9d   count:1
> mgr                                         2/2  2m ago     9d   count:2
> mon                                         5/5  2m ago     9d   count:5
> node-exporter              ?:9100           6/6  2m ago     9d   *
> osd                                           2  2m ago     -
> <unmanaged>
> osd.all-available-devices                    16  2m ago     2d   *
> prometheus                 ?:9095           1/1  2m ago     9d   count:1
> rgw.obj0                   ?:80             1/6  2m ago     9d
>  c01;c02;c03;c04;c05;c06;count:6
> rgw.obj01                  ?:80             5/6  2m ago     5d
>  c01;c02;c03;c04;c05;c06
>
>
> To my untrained eye, it looks like rgw.obj0 is extra and unneeded.  Does
> anyone know a way to prove this out and if needed remove it?
>
> Thanks!
>
> Ron Gage
> Westland, MI
>
> -----Original Message-----
> From: Eugen Block <[email protected]>
> Sent: Thursday, February 17, 2022 2:32 AM
> To: [email protected]
> Subject: [ceph-users] Re: Problem with Ceph daemons
>
> Can you retry after resetting the systemd unit? The message "Start request
> repeated too quickly." should be cleared first, then start it
> again:
>
> systemctl reset-failed
> ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> systemctl start
> ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
>
> Then check the logs again. If there's still nothing in the rgw log then
> you'll need to check the (active) mgr daemon logs for anything suspicious
> and also the syslog on that rgw host. Is the rest of the cluster healthy?
> Are rgw daemons colocated with other services?
>
>
> Zitat von Ron Gage <[email protected]>:
>
> > Adam:
> >
> >
> >
> > Not really….
> >
> >
> >
> > -- Unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has begun starting up.
> >
> > Feb 16 15:01:03 c01 podman[426007]:
> >
> > Feb 16 15:01:04 c01 bash[426007]:
> > 915d1e19fa0f213902c666371c8e825480e103f85172f3b15d1d5bf2427a87c9
> >
> > Feb 16 15:01:04 c01 conmon[426038]: debug
> > 2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 deferred set uid:gid to
> > 167:167 (ceph:ceph)
> >
> > Feb 16 15:01:04 c01 conmon[426038]: debug
> > 2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 ceph version 16.2.7
> > (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (st>
> >
> > Feb 16 15:01:04 c01 conmon[426038]: debug
> > 2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 framework: beast
> >
> > Feb 16 15:01:04 c01 conmon[426038]: debug
> > 2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 framework conf key:
> > port, val: 80
> >
> > Feb 16 15:01:04 c01 conmon[426038]: debug
> > 2022-02-16T20:01:04.303+0000 7f4f72ff6440  1 radosgw_Main not setting
> > numa affinity
> >
> > Feb 16 15:01:04 c01 systemd[1]: Started Ceph rgw.obj0.c01.gpqshk for
> > 35194656-893e-11ec-85c8-005056870dae.
> >
> > -- Subject: Unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has finished start-up
> >
> > -- Defined-By: systemd
> >
> > -- Support: https://access.redhat.com/support
> >
> > --
> >
> > -- Unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has finished starting up.
> >
> > --
> >
> > -- The start-up result is done.
> >
> > Feb 16 15:01:04 c01 systemd[1]:
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:
> > Main process exited, code=exited, status=98/n/a
> >
> > Feb 16 15:01:05 c01 systemd[1]:
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:
> > Failed with result 'exit-code'.
> >
> > -- Subject: Unit failed
> >
> > -- Defined-By: systemd
> >
> > -- Support: https://access.redhat.com/support
> >
> > --
> >
> > -- The unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has entered the 'failed' state with result 'exit-code'.
> >
> > Feb 16 15:01:15 c01 systemd[1]:
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:
> > Service RestartSec=10s expired, scheduling restart.
> >
> > Feb 16 15:01:15 c01 systemd[1]:
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:
> > Scheduled restart job, restart counter is at 5.
> >
> > -- Subject: Automatic restarting of a unit has been scheduled
> >
> > -- Defined-By: systemd
> >
> > -- Support: https://access.redhat.com/support
> >
> > --
> >
> > -- Automatic restarting of the unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has been scheduled, as the result for
> >
> > -- the configured Restart= setting for the unit.
> >
> > Feb 16 15:01:15 c01 systemd[1]: Stopped Ceph rgw.obj0.c01.gpqshk for
> > 35194656-893e-11ec-85c8-005056870dae.
> >
> > -- Subject: Unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has finished shutting down
> >
> > -- Defined-By: systemd
> >
> > -- Support: https://access.redhat.com/support
> >
> > --
> >
> > -- Unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has finished shutting down.
> >
> > Feb 16 15:01:15 c01 systemd[1]:
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:
> > Start request repeated too quickly.
> >
> > Feb 16 15:01:15 c01 systemd[1]:
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:
> > Failed with result 'exit-code'.
> >
> > -- Subject: Unit failed
> >
> > -- Defined-By: systemd
> >
> > -- Support: https://access.redhat.com/support
> >
> > --
> >
> > -- The unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has entered the 'failed' state with result 'exit-code'.
> >
> > Feb 16 15:01:15 c01 systemd[1]: Failed to start Ceph
> > rgw.obj0.c01.gpqshk for 35194656-893e-11ec-85c8-005056870dae.
> >
> > -- Subject: Unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has failed
> >
> > -- Defined-By: systemd
> >
> > -- Support: https://access.redhat.com/support
> >
> > --
> >
> > -- Unit
> > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
> > has failed.
> >
> > --
> >
> > -- The result is failed.
> >
> >
> >
> > Ron Gage
> >
> > Westland, MI
> >
> >
> >
> > From: Adam King <[email protected]>
> > Sent: Wednesday, February 16, 2022 4:18 PM
> > To: Ron Gage <[email protected]>
> > Cc: ceph-users <[email protected]>
> > Subject: Re: [ceph-users] Problem with Ceph daemons
> >
> >
> >
> > Is there anything useful in the rgw daemon's logs? (e.g. journalctl
> > -xeu [email protected]
> > <mailto:[email protected]>
> > )
> >
> >
> >
> >  - Adam King
> >
> >
> >
> > On Wed, Feb 16, 2022 at 3:58 PM Ron Gage <[email protected]
> > <mailto:[email protected]> > wrote:
> >
> > Hi everyone!
> >
> >
> >
> > Looks like I am having some problems with some of my ceph RGW daemons
> > - they won't stay running.
> >
> >
> >
> > From 'cephadm ls'.
> >
> >
> >
> > {
> >
> >         "style": "cephadm:v1",
> >
> >         "name": "rgw.obj0.c01.gpqshk",
> >
> >         "fsid": "35194656-893e-11ec-85c8-005056870dae",
> >
> >         "systemd_unit":
> > "[email protected]
> > <mailto:[email protected]
> > <mailto:[email protected]>
> > > ",
> >
> >         "enabled": true,
> >
> >         "state": "error",
> >
> >         "service_name": "rgw.obj0",
> >
> >         "ports": [
> >
> >             80
> >
> >         ],
> >
> >         "ip": null,
> >
> >         "deployed_by": [
> >
> >
> > "quay.io/ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e2
> > cd9114
> > <http://quay.io/ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318
> > 538c7e2cd911458800097f7d97d>
> > 58800097f7d97d
> > <mailto:quay.io <mailto:quay.io>
> > /ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e
> > 2cd911458800097f7d97d> ",
> >
> >
> > "quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76f
> > ff41a7
> > <http://quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f3
> > 1eaa76fff41a77fa32d0b903061>
> > 7fa32d0b903061
> > <mailto:quay.io <mailto:quay.io>
> > /ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76
> > fff41a77fa32d0b903061> "
> >
> >         ],
> >
> >         "rank": null,
> >
> >         "rank_generation": null,
> >
> >         "memory_request": null,
> >
> >         "memory_limit": null,
> >
> >         "container_id": null,
> >
> >         "container_image_name":
> > "quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76f
> > ff41a7
> > <http://quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f3
> > 1eaa76fff41a77fa32d0b903061>
> > 7fa32d0b903061
> > <mailto:quay.io <mailto:quay.io>
> > /ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76
> > fff41a77fa32d0b903061> ",
> >
> >         "container_image_id": null,
> >
> >         "container_image_digests": null,
> >
> >         "version": null,
> >
> >         "started": null,
> >
> >         "created": "2022-02-09T01:00:53.411541Z",
> >
> >         "deployed": "2022-02-09T01:00:52.338515Z",
> >
> >         "configured": "2022-02-09T01:00:53.411541Z"
> >
> >     },
> >
> >
> >
> > That whole "state: error" bit is concerning to me - and it
> > contributing to the cluster status of warning (showing 6 cephadm daemons
> down).
> >
> >
> >
> > Can I get a hint or two on how to fix this?
> >
> >
> > Thanks!
> >
> >
> >
> > Ron Gage
> >
> > Westland, MI
> >
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > <mailto:[email protected]> To unsubscribe send an email to
> > [email protected] <mailto:[email protected]>
> >
> > _______________________________________________
> > ceph-users mailing list -- [email protected] To unsubscribe send an
> > email to [email protected]
>
>
>
> _______________________________________________
> ceph-users mailing list -- [email protected] To unsubscribe send an
> email to [email protected]
>
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Problem with Ceph daemons

Reply via email to