Hi, today we had an issue with our 6 node Ceph cluster.
We had to shutdown one node (Ceph-03), to replace a disk (because, we did now
know the slot). We set the noout flag and did a graceful shutdown. All was O.K.
After the disk was replaced, the node comes up and our VMs had a big I/O
latency.
We never saw this in the past, with the same procedure ...
* From our logs on Ceph-01:
2018-10-18 15:53:45.455743 mon.qh-a07-ceph-osd-03 mon.2 10.3.0.3:6789/0 1 :
cluster [INF] mon.qh-a07-ceph- osd-03 calling monitor election
...
2018-10-18 15:53:45.503818 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663050
: cluster [INF] mon.qh-a07-ceph-osd-01 is new leader, mons
qh-a07-ceph-osd-01,qh-a07-ceph-osd-02,qh-a07-ceph-osd-03,qh-a07-ceph-osd-04,qh-a07-ceph-osd-05,qh
* First OSD comes up:
2018-10-18 15:53:55.207742 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663063
: cluster [WRN] Health check update: 10 osds down (OSD_DOWN)
2018-10-18 15:53:55.207768 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663064
: cluster [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (11 osds) down)
2018-10-18 15:53:55.240079 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663065
: cluster [INF] osd.43 10.3.0.3:6812/7554 boot
* All OSDs where up:
2018-10-18 15:54:25.331692 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663134
: cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2018-10-18 15:54:25.360151 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663135
: cluster [INF] osd.12 10.3.0.3:6820/8537 boot
* This OSDs here are a mix of HDD and SDD and different nodes
2018-10-18 15:54:27.073266 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663138
: cluster [WRN] Health check update: Degraded data redundancy: 84012/4293867
objects degraded (1.957%), 1316 pgs degraded, 487 pgs undersized (PG_DEGRADED)
2018-10-18 15:54:32.073644 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663142
: cluster [WRN] Health check update: Degraded data redundancy: 4611/4293867
objects degraded (0.107%), 1219 pgs degraded (PG_DEGRADED)
2018-10-18 15:54:36.841189 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663144
: cluster [WRN] Health check failed: 1 slow requests are blocked > 32 sec.
Implicated osds 16 (REQUEST_SLOW)
2018-10-18 15:54:37.074098 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663145
: cluster [WRN] Health check update: Degraded data redundancy: 4541/4293867
objects degraded (0.106%), 1216 pgs degraded (PG_DEGRADED)
2018-10-18 15:54:42.074510 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663149
: cluster [WRN] Health check update: Degraded data redundancy: 4364/4293867
objects degraded (0.102%), 1176 pgs degraded (PG_DEGRADED)
2018-10-18 15:54:42.074561 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663150
: cluster [WRN] Health check update: 5 slow requests are blocked > 32 sec.
Implicated osds 15,25,30,34 (REQUEST_SLOW)
2018-10-18 15:54:47.074886 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663152
: cluster [WRN] Health check update: Degraded data redundancy: 4193/4293867
objects degraded (0.098%), 1140 pgs degraded (PG_DEGRADED)
2018-10-18 15:54:47.074934 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663153
: cluster [WRN] Health check update: 5 slow requests are blocked > 32 sec.
Implicated osds 9,15,23,30 (REQUEST_SLOW)
2018-10-18 15:54:52.075274 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663156
: cluster [WRN] Health check update: Degraded data redundancy: 4087/4293867
objects degraded (0.095%), 1120 pgs degraded (PG_DEGRADED)
2018-10-18 15:54:52.075313 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663157
: cluster [WRN] Health check update: 14 slow requests are blocked > 32 sec.
Implicated osds 2,13,14,15,16,23 (REQUEST_SLOW)
2018-10-18 15:54:57.075635 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663158
: cluster [WRN] Health check update: Degraded data redundancy: 3932/4293867
objects degraded (0.092%), 1074 pgs degraded (PG_DEGRADED)
2018-10-18 15:54:57.075683 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663159
: cluster [WRN] Health check update: 6 slow requests are blocked > 32 sec.
Implicated osds 14,15 (REQUEST_SLOW)
2018-10-18 15:55:02.076071 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663163
: cluster [WRN] Health check update: Degraded data redundancy: 3805/4293867
objects degraded (0.089%), 1036 pgs degraded (PG_DEGRADED)
2018-10-18 15:55:02.076138 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663164
: cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec.
Implicated osds 1,15,19 (REQUEST_SLOW)
2018-10-18 15:55:07.076562 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663168
: cluster [WRN] Health check update: Degraded data redundancy: 3656/4293867
objects degraded (0.085%), 988 pgs degraded (PG_DEGRADED)
2018-10-18 15:55:07.076633 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663169
: cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec.
Implicated osds 15,22 (REQUEST_SLOW)
2018-10-18 15:55:12.077091 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663174
: cluster [WRN] Health check update: Degraded data redundancy: 3509/4293867
objects degraded (0.082%), 934 pgs degraded (PG_DEGRADED)
2018-10-18 15:55:12.077130 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663175
: cluster [WRN] Health check update: 1 slow requests are blocked > 32 sec.
Implicated osds 15 (REQUEST_SLOW)
2018-10-18 15:55:13.317758 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663178
: cluster [INF] Health check cleared: REQUEST_SLOW (was: 1 slow requests are
blocked > 32 sec. Implicated osds 15)
....
2018-10-18 15:55:17.077685 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663180
: cluster [WRN] Health check update: Degraded data redundancy: 3351/4293867
objects degraded (0.078%), 886 pgs degraded, 1 pg undersized (PG_DEGRADED)
2018-10-18 15:55:22.078077 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663184
: cluster [WRN] Health check update: Degraded data redundancy: 3202/4293867
objects degraded (0.075%), 831 pgs degraded, 3 pgs undersized (PG_DEGRADED)
* And then, all went back to work:
2018-10-18 16:00:22.876849 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663322
: cluster [WRN] Health check failed: Reduced data availability: 4 pgs inactive,
6 pgs peering (PG_AVAILABILITY)
2018-10-18 16:00:24.898816 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663326
: cluster [WRN] Health check failed: 102307/4289049 objects misplaced (2.385%)
(OBJECT_MISPLACED)
2018-10-18 16:00:28.995821 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663327
: cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
availability: 3 pgs peering)
2018-10-18 16:00:31.026616 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663330
: cluster [WRN] Health check update: 156337/4294191 objects misplaced (3.641%)
(OBJECT_MISPLACED)
A few mins later, we added the replaced disks again (on ceph-osd-06 and on
ceph-osd-03), without issues. To the I/O stuck started as the node where in the
cluster again ( 15:53:45 ...) we have no clue ... and never happens in the past.
We have per OSD node:
* 2 x Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
* 64GB DDR4 ECC
* HDD: WD Red 1TB 2.5 with Intel NVME SSDPEDMD400G4 as journal device
* SDD: Samsung 950 EVO 512GB
* 2 x Symbios Logic SAS3224 PCI-Express Fusion-MPT SAS-3 (one for SSD, one for
HDD)
* Mellanox 100GB Ceph network (CX-4)
* 6 Mon (on Ceph nodes)
Ceph.conf
[global]
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 20
osd recovery max active = 8
no special crush rule, just the HDD / SDD one.
* ceph osd pool get ssd-pool pg_num
pg_num: 2048
* root@qh-a07-ceph-osd-01:[~]: ceph osd pool get sata-pool pg_num
pg_num: 1024
* root@qh-a07-ceph-osd-01:[~]: ceph osd pool get db-pool pg_num
pg_num: 1024
* ceph -s
data:
pools: 3 pools, 4096 pgs
objects: 1.43M objects, 5.39TiB
usage: 15.7TiB used, 33.4TiB / 49.1TiB avail
pgs: 4096 active+clean
Any suggestions to avoid this stuck next time ?
cu denny
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
