Hi. Vasiliy, Yes it is a problem with crusmap. Look at height: " -3 14.56000 host slpeah001 -2 14.56000 host slpeah002 "
С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич < [email protected]>: > It seams that you played around with crushmap, and done something wrong. > Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' > devices renamed to 'device' think threre is you problem. > > Отправлено с мобильного устройства. > > > -----Original Message----- > From: Vasiliy Angapov <[email protected]> > To: ceph-users <[email protected]> > Sent: чт, 26 нояб. 2015 7:53 > Subject: [ceph-users] Undersized pgs problem > > Hi, colleagues! > > I have small 4-node CEPH cluster (0.94.2), all pools have size 3, min_size > 1. > This night one host failed and cluster was unable to rebalance saying > there are a lot of undersized pgs. > > root@slpeah002:[~]:# ceph -s > cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728 > health HEALTH_WARN > 1486 pgs degraded > 1486 pgs stuck degraded > 2257 pgs stuck unclean > 1486 pgs stuck undersized > 1486 pgs undersized > recovery 80429/555185 <80429555185> objects degraded > (14.487%) > recovery 40079/555185 objects misplaced (7.219%) > 4/20 in osds are down > 1 mons down, quorum 1,2 slpeah002,slpeah007 > monmap e7: 3 mons at > {slpeah001= > 192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0} > > election epoch 710, quorum 1,2 slpeah002,slpeah007 > osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs > pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects > 3366 GB used, 93471 GB / 96838 GB avail > 80429/555185 <80429555185> objects degraded (14.487%) > 40079/555185 objects misplaced (7.219%) > 1903 active+clean > 1486 active+undersized+degraded > 771 active+remapped > client io 0 B/s rd, 246 kB/s wr, 67 op/s > > root@slpeah002:[~]:# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 94.63998 root default > -9 32.75999 host slpeah007 > 72 5.45999 osd.72 up 1.00000 1.00000 > 73 5.45999 osd.73 up 1.00000 1.00000 > 74 5.45999 osd.74 up 1.00000 1.00000 > 75 5.45999 osd.75 up 1.00000 1.00000 > 76 5.45999 osd.76 up 1.00000 1.00000 > 77 5.45999 osd.77 up 1.00000 1.00000 > -10 32.75999 host slpeah008 > 78 5.45999 osd.78 up 1.00000 1.00000 > 79 5.45999 osd.79 up 1.00000 1.00000 > 80 5.45999 osd.80 up 1.00000 1.00000 > 81 5.45999 osd.81 up 1.00000 1.00000 > 82 5.45999 osd.82 up 1.00000 1.00000 > 83 5.45999 osd.83 up 1.00000 1.00000 > -3 14.56000 host slpeah001 > 1 3.64000 osd.1 down 1.00000 1.00000 > 33 3.64000 osd.33 down 1.00000 1.00000 > 34 3.64000 osd.34 down 1.00000 1.00000 > 35 3.64000 osd.35 down 1.00000 1.00000 > -2 14.56000 host slpeah002 > 0 3.64000 osd.0 up 1.00000 1.00000 > 36 3.64000 osd.36 up 1.00000 1.00000 > 37 3.64000 osd.37 up 1.00000 1.00000 > 38 3.64000 osd.38 up 1.00000 1.00000 > > Crushmap: > > # begin crush map > tunable choose_local_tries 0 > tunable choose_local_fallback_tries 0 > tunable choose_total_tries 50 > tunable chooseleaf_descend_once 1 > tunable chooseleaf_vary_r 1 > tunable straw_calc_version 1 > tunable allowed_bucket_algs 54 > > # devices > device 0 osd.0 > device 1 osd.1 > device 2 device2 > device 3 device3 > device 4 device4 > device 5 device5 > device 6 device6 > device 7 device7 > device 8 device8 > device 9 device9 > device 10 device10 > device 11 device11 > device 12 device12 > device 13 device13 > device 14 device14 > device 15 device15 > device 16 device16 > device 17 device17 > device 18 device18 > device 19 device19 > device 20 device20 > device 21 device21 > device 22 device22 > device 23 device23 > device 24 device24 > device 25 device25 > device 26 device26 > device 27 device27 > device 28 device28 > device 29 device29 > device 30 device30 > device 31 device31 > device 32 device32 > device 33 osd.33 > device 34 osd.34 > device 35 osd.35 > device 36 osd.36 > device 37 osd.37 > device 38 osd.38 > device 39 device39 > device 40 device40 > device 41 device41 > device 42 device42 > device 43 device43 > device 44 device44 > device 45 device45 > device 46 device46 > device 47 device47 > device 48 device48 > device 49 device49 > device 50 device50 > device 51 device51 > device 52 device52 > device 53 device53 > device 54 device54 > device 55 device55 > device 56 device56 > device 57 device57 > device 58 device58 > device 59 device59 > device 60 device60 > device 61 device61 > device 62 device62 > device 63 device63 > device 64 device64 > device 65 device65 > device 66 device66 > device 67 device67 > device 68 device68 > device 69 device69 > device 70 device70 > device 71 device71 > device 72 osd.72 > device 73 osd.73 > device 74 osd.74 > device 75 osd.75 > device 76 osd.76 > device 77 osd.77 > device 78 osd.78 > device 79 osd.79 > device 80 osd.80 > device 81 osd.81 > device 82 osd.82 > device 83 osd.83 > > # types > type 0 osd > type 1 host > type 2 chassis > type 3 rack > type 4 row > type 5 pdu > type 6 pod > type 7 room > type 8 datacenter > type 9 region > type 10 root > > # buckets > host slpeah007 { > id -9 # do not change unnecessarily > # weight 32.760 > alg straw > hash 0 # rjenkins1 > item osd.72 weight 5.460 > item osd.73 weight 5.460 > item osd.74 weight 5.460 > item osd.75 weight 5.460 > item osd.76 weight 5.460 > item osd.77 weight 5.460 > } > host slpeah008 { > id -10 # do not change unnecessarily > # weight 32.760 > alg straw > hash 0 # rjenkins1 > item osd.78 weight 5.460 > item osd.79 weight 5.460 > item osd.80 weight 5.460 > item osd.81 weight 5.460 > item osd.82 weight 5.460 > item osd.83 weight 5.460 > } > host slpeah001 { > id -3 # do not change unnecessarily > # weight 14.560 > alg straw > hash 0 # rjenkins1 > item osd.1 weight 3.640 > item osd.33 weight 3.640 > item osd.34 weight 3.640 > item osd.35 weight 3.640 > } > host slpeah002 { > id -2 # do not change unnecessarily > # weight 14.560 > alg straw > hash 0 # rjenkins1 > item osd.0 weight 3.640 > item osd.36 weight 3.640 > item osd.37 weight 3.640 > item osd.38 weight 3.640 > } > root default { > id -1 # do not change unnecessarily > # weight 94.640 > alg straw > hash 0 # rjenkins1 > item slpeah007 weight 32.760 > item slpeah008 weight 32.760 > item slpeah001 weight 14.560 > item slpeah002 weight 14.560 > } > > # rules > rule default { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > > # end crush map > > > > This is odd because pools have size 3 and I have 3 hosts alive, so why > it is saying that undersized pgs are present? It makes me feel like > CRUSH is not working properly. > There is not much data currently in cluster, something about 3TB and > as you can see from osd tree - each host have minimum of 14TB disk > space on OSDs. > So I'm a bit stuck now... > How can I find the source of trouble? > > Thanks in advance! > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
