[ceph-users] ceph orch upgrade is stuck at the beginning

sylvain.desbureaux Fri, 09 Jul 2021 08:12:59 -0700

Hello, I've tried to upgrade our ceph cluster to pacific release (version 
16.2.0 and then planned to move to each version 1 by 1) but it seems that on 
our cluster, it's failing


I've installed it (long time ago...) via cephadm on version v15 (I guess it was 
a v15.2.8 underneath at this time).

I remember having an issue with ceph mgr which leads to use 
ceph-base:latest-octopus to fix (next version wasn't released at this time and 
it was crashing the cluster by filling the logs)

the cluster state is OK:

  cluster:
    id:     adc48d6a-61bf-11eb-9212-2f70acf7224f
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum server36,server38,server37 (age 3h)
    mgr: server36.xujjng(active, since 2h), standbys: server37.fyglah
    osd: 52 osds: 52 up (since 4h), 52 in (since 4h)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    pools:   11 pools, 489 pgs
    objects: 1.28M objects, 4.8 TiB
    usage:   15 TiB used, 65 TiB / 81 TiB avail
    pgs:     489 active+clean

  io:
    client:   3.2 MiB/s rd, 24 MiB/s wr, 2.73k op/s rd, 1.81k op/s wr

So I tried "ceph orch upgrade start --ceph-version 16.2.0) and the first time 
it deployed a new ceph mgr with 16.2.0 version and got stuck here.
After waiting several hours, I stop and restarted and nothing happened.

I've then manually upgraded the whole cluster except the 2 rgw and 
grafana/prometheus/alertmanager/nodeexporter
I retried and I see nothing happening, in the different logs (in debug) 
(cephadm, logs of active mgr, ceph -W cephadm --watch-debug, ...)
I also tried with 16.2.1 now as it seems 16.2.0 wasn't working but I have the 
same effect

here's what I see for ceph -W cephadm --watch-debug:
2021-07-09T14:13:14.642077+0000 mgr.server36.xujjng [INF] Upgrade: Started with 
target docker.io/ceph/ceph:v16.2.1

and nothing

in mgr docker logs, I see (roughly) the same line and then debug stuff not 
related :
::ffff:127.0.0.1 - - [09/Jul/2021:15:08:33] "GET /metrics HTTP/1.1" 200 1423923 
"" "Prometheus/2.18.1"
debug 2021-07-09T15:08:35.829+0000 7fa870818700  0 log_channel(cluster) log 
[DBG] : pgmap v3602: 489 pgs: 489 active+clean; 4.8 TiB data, 15 TiB used, 65 
TiB / 81 TiB avail; 2.8 MiB/s rd, 37 MiB/s wr, 3.99k op/s
debug 2021-07-09T15:08:37.829+0000 7fa870818700  0 log_channel(cluster) log 
[DBG] : pgmap v3603: 489 pgs: 489 active+clean; 4.8 TiB data, 15 TiB used, 65 
TiB / 81 TiB avail; 2.0 MiB/s rd, 30 MiB/s wr, 2.75k op/s

I don't see anything in cephadm logs

status is not very good also:

{
    "target_image": "docker.io/ceph/ceph:v16.2.1",
    "in_progress": true,
    "services_complete": [],
    "progress": "",
    "message": ""
}

Do you know where I could find some log / info in order to see why it doesn't 
start?

thanks!
Sylvain

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] ceph orch upgrade is stuck at the beginning

Reply via email to