Hi,
I am trying to deploy Ceph Quincy using ceph-ansible on Rocky9. I am having
some problems and I don't know where to search for the reason.
PS : I did the same deployment on Rocky8 using ceph-ansible for the Pacific
version on the same hardware and it worked perfectly.
I have 03 controllers nodes : mon, mgr, mdss and rgws
and 27 osd nodes : with 04 nvme disks (osd) each
I am using a 10Gb network with jumbo frames.
The deployment starts with no issues, the 03 monitors are created
correctly, then the 03 managers are created, after that the OSD are
prepared and formatted, until here everything is working fine, but when the
"*wait for all osd to be up*" task is launched, which means starting all
OSDs containers in all OSD nodes, things go south, the monitors become out
of quorum, ceph -s takes a lot of time to respond and not all OSDs are
being activated, and the deployment fails at the end.
cluster 2023-03-06T12:00:26.431947+0100 mon.controllera (mon.0) 3864 :
cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum controllera,controllerc
cluster 2023-03-06T12:00:26.431953+0100 mon.controllera (mon.0) 3865 :
cluster [WRN] mon.controllerb (rank 1) addr [v2:
20.1.0.27:3300/0,v1:20.1.0.27:6789/0] is down (out of quorum)
The monitor container in 2 of my controllers nodes stays at 100% of cpu
utilization.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
068e4e55f299 ceph-mon-controllera 99.91% 58.12MiB / 376.1GiB
0.02% 0B / 0B 122MB / 85.3MB 28 <-----------------
87730f89420d ceph-mgr-controllera 0.32% 408.2MiB / 376.1GiB
0.11% 0B / 0B 181MB / 0B 35
Could that be a resource problem? the monitor containers do not have enough
resources CPU, RAM, ...etc to handle all the OSDs that are being started?
If yes, how may I find this?
thanks in advance.
Regards.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]