Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

Ronny Aasen Wed, 11 Apr 2018 10:25:53 -0700

ceph upgrades are usualy not a problem:

ceph have to be upgraded in the right order. normally when each serviceis on its own machine this is not difficult.but when you have mon, mgr, osd, mds, and klients on the same host youhave to do it a bit carefully..

i tend to have a terminal open with "watch ceph -s" running, and i neverdo another service until the health is ok again.

first apt upgrade the packages on all the hosts. This only update thesoftware on disk and not the running services.then do the restart of services in the right order. and only on onehost at the time


mons: first you restart the mon service on all mon running hosts.

all the 3 mons are active at the same time, so there is no "shiftingaround" but make sure the quorum is ok again before you do the next mon.

mgr: then restart mgr on all hosts that run mgr. there is only oneactive mgr at the time now, so here there will be a bit of shiftingaround. but it is only for statistics/management so it may affect yourceph -s command, but not the cluster operation.

osd: restart osd processes one osd at the time, make sure health_okbefore doing the next osd process. do this for all hosts that have osd's

mds: restart mds's one at the time. you will notice the standby mdstaking over for the mds that was restarted. do both.

klients: restart clients, that means remount filesystems, migrate orrestart vm's. or restart whatever process uses the old ceph libraries.



about pools:

since you only have 2 osd's you can obviously not be running therecommended 3 replication pools. ? this makes me worry that you may berunning size:2 min_size:1 pools. and are daily running risk of datalossdue to corruption and inconsistencies. especially when you restart osd's

if your pools are size:2 min_size:2 then your cluster will fail when anyosd is restarted, until the osd is up and healthy again. but you haveless chance for dataloss then 2/1 pools.

if you added a osd on a third host you can run size:3 min_size:2 . therecommended config when you can have both redundancy and highavailabillity.



kind regards
Ronny Aasen







On 11.04.2018 17:42, Ranjan Ghosh wrote:

Ah, nevermind, we've solved it. It was a firewall issue. The onlything that's weird is that it became an issue immediately after anupdate. Perhaps it has sth. to do with monitor nodes shifting aroundor anything. Well, thanks again for your quick support, though. It'smuch appreciated.


BR

Ranjan


Am 11.04.2018 um 17:07 schrieb Ranjan Ghosh:

Thank you for your answer. Do you have any specifics on which threadyou're talking about? Would be very interested to read about asuccess story, because I fear that if I update the other node thatthe whole cluster comes down.



Am 11.04.2018 um 10:47 schrieb Marc Roos:

I think you have to update all osd's, mon's etc. I can remember running
into similar issue. You should be able to find more about this in
mailing list archive.

-----Original Message-----
From: Ranjan Ghosh [mailto:[email protected]]
Sent: woensdag 11 april 2018 16:02
To: ceph-users
Subject: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 =>
12.2.2

Hi all,

We have a two-cluster-node (with a third "monitoring-only" node). Over
the last months, everything ran *perfectly* smooth. Today, I did an
Ubuntu "apt-get upgrade" on one of the two servers. Among others, the
ceph packages were upgraded from 12.2.1 to 12.2.2. A minor release
update, one might think. But, to my surprise, after restarting the
services, Ceph is now in degraded state :-( (see below). Only the first
node - which ist still on 12.2.1 - seems to be running. I did a bit of
research and found this:

https://ceph.com/community/new-luminous-pg-overdose-protection/

I did set "mon_max_pg_per_osd = 300" to no avail. Don't know if this is
the problem at all.

Looking at the status it seems we have 264 pgs, right? When I enter

"ceph osd df" (which I found on another website claiming it shouldprint

the number of PGs per OSD), it just hangs (need to abort with Ctrl+C).

Hope anybody can help me. The cluster know works with the single node,
but it is definively quite worrying because we don't have redundancy.

Thanks in advance,

Ranjan


root@tukan2 /var/www/projects # ceph -s
    cluster:
      id:     19895e72-4a0c-4d5d-ae23-7f631ec8c8e4
      health: HEALTH_WARN
              insufficient standby MDS daemons available
              Reduced data availability: 264 pgs inactive
              Degraded data redundancy: 264 pgs unclean

    services:
      mon: 3 daemons, quorum tukan1,tukan2,tukan0
      mgr: tukan0(active), standbys: tukan2
      mds: cephfs-1/1/1 up  {0=tukan2=up:active}
      osd: 2 osds: 2 up, 2 in

    data:
      pools:   3 pools, 264 pgs
      objects: 0 objects, 0 bytes
      usage:   0 kB used, 0 kB / 0 kB avail
      pgs:     100.000% pgs unknown

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

Reply via email to