Hi, when upgrading my cluster from 10.2.3 to 10.2.6 I've faced a major failure and I think it could(?) be a bug.
My SO is Ubuntu (Xenial), Ceph packages are also from distro. My cluster have 3 monitors and 96 OSDs. First I stoped one mon, then upgrade SO packages, reboot, it came back on as expected with no failures. Did the same with another mon, OK too, but when I stopped my last mon a HEALTH_ERR, tons of blocked requests and several minutes (with almost zero client I/O) until the recovery process starts... Two days latter (with an inconvinient performance degradation) the cluster became HEALTH_OK again, just then I upgraded all my OSDs from 10.2.3 to 10.2.6 (this time fortunately without any surprises). My question is: why this happened? In my logs I can see only (from monitor booting process) things like: 2017-03-27 11:21:13.955155 7f7b24df3700 0 mon.mon-node1@-1(probing).osd e166803 crush map has features 288514051259236352, adjusting msgr requires 2017-03-27 11:21:14.020915 7f7b16a10700 0 -- 10.2.15.20:6789/0 >> 10.2.15.22:6789/0 pipe(0x55eeea485400 sd=12 :49238 s=2 pgs=3041041 cs=1 l=0 c=0x55eee9206c00).reader missed message? skipped from seq 0 to 821720064 2017-03-27 11:21:14.021322 7f7b1690f700 0 -- 10.2.15.20:6789/0 >> 10.2.15.21:6789/0 pipe(0x55eeea484000 sd=11 :44714 s=2 pgs=6749444 cs=1 l=0 c=0x55eee9206a80).reader missed message? skipped from seq 0 to 1708671746 And also (from all my OSDs) a lot of: 2017-03-27 11:21:46.991533 osd.62 10.2.15.37:6812/4072 21935 : cluster [WRN] failed to encode map e167847 with expected crc When things started to goes wrong (when I stopped mon-node1 to upgrade, the last one) I can see: 2017-03-27 11:05:07.143529 mon.1 10.2.15.21:6789/0 653 : cluster [INF] +HEALTH_ERR; 54 pgs are stuck inactive for more than 300 seconds; 2153 pgs backfill_wait; 21 pgs +backfilling; 53 pgs degraded; 2166 pgs peering; 3 pgs recovering; 50 pgs recovery_wait; 54 pgs stuck +inactive; 118 pgs stuck unclean; 1549 requests are blocked > 32 sec; recovery 28926/57075284 objects +degraded (0.051%); recovery 24971455/57075284 objects misplaced (43.752%); all OSDs are running jewel or +later but the 'require_jewel_osds' osdmap flag is not set; 1 mons down, quorum 1,2 mon-node2,mon-node3 And when mon-node1 came back (already upgraded): 2017-03-27 11:21:58.987092 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1 calling new monitor election 2017-03-27 11:21:58.987186 7f7b18c16700 1 mon.mon-node1@0(electing).elector(162) init, last seen epoch 162 2017-03-27 11:21:59.064957 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1 calling new monitor election 2017-03-27 11:21:59.065029 7f7b18c16700 1 mon.mon-node1@0(electing).elector(165) init, last seen epoch 165 2017-03-27 11:21:59.096933 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1@0 won leader election with quorum 0,1,2 2017-03-27 11:21:59.114194 7f7b18c16700 0 log_channel(cluster) log [INF] : HEALTH_ERR; 2167 pgs are stuck inactive for more than 300 seconds; 2121 pgs backfill_wait; 25 pgs backfilling; 25 pgs degraded; 2147 pgs peering; 25 pgs recovery_wait; 25 pgs stuck degraded; 2167 pgs stuck inactive; 4338 pgs stuck unclean; 5082 requests are blocked > 32 sec; recovery 11846/55732755 objects degraded (0.021%); recovery 24595033/55732755 objects misplaced (44.130%); all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set crc errors disappeared when all monitors were upgraded and require_jewel_osds flag was set too. It seems that the entire cluster was rebuilded, fortunately I didn't lose any data. So is it a bug, expected behavior or I did something wrong? I've updated Ceph several times and never had problems. -- Herbert
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
