Re: [ceph-users] Undersized pgs problem

Irek Fasikhov Thu, 26 Nov 2015 05:06:33 -0800

Hi.
Vasiliy, Yes it is a problem with crusmap. Look at height:
" -3 14.56000     host slpeah001
 -2 14.56000     host slpeah002
 "


С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич <
[email protected]>:

> It seams that you played around with crushmap, and done something wrong.
> Compare the look of 'ceph osd tree' and crushmap. There are some 'osd'
> devices renamed to 'device' think threre is you problem.
>
> Отправлено с мобильного устройства.
>
>
> -----Original Message-----
> From: Vasiliy Angapov <[email protected]>
> To: ceph-users <[email protected]>
> Sent: чт, 26 нояб. 2015 7:53
> Subject: [ceph-users] Undersized pgs problem
>
> Hi, colleagues!
>
> I have small 4-node CEPH cluster (0.94.2), all pools have size 3, min_size
> 1.
> This night one host failed and cluster was unable to rebalance saying
> there are a lot of undersized pgs.
>
> root@slpeah002:[~]:# ceph -s
>     cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728
>      health HEALTH_WARN
>             1486 pgs degraded
>             1486 pgs stuck degraded
>             2257 pgs stuck unclean
>             1486 pgs stuck undersized
>             1486 pgs undersized
>             recovery 80429/555185 <80429555185> objects degraded
> (14.487%)
>             recovery 40079/555185 objects misplaced (7.219%)
>             4/20 in osds are down
>             1 mons down, quorum 1,2 slpeah002,slpeah007
>      monmap e7: 3 mons at
> {slpeah001=
> 192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0}
>
>             election epoch 710, quorum 1,2 slpeah002,slpeah007
>      osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs
>       pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects
>             3366 GB used, 93471 GB / 96838 GB avail
>             80429/555185 <80429555185> objects degraded (14.487%)
>             40079/555185 objects misplaced (7.219%)
>                 1903 active+clean
>                 1486 active+undersized+degraded
>                  771 active+remapped
>   client io 0 B/s rd, 246 kB/s wr, 67 op/s
>
>   root@slpeah002:[~]:# ceph osd tree
> ID  WEIGHT   TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY
>  -1 94.63998 root default
>  -9 32.75999     host slpeah007
>  72  5.45999         osd.72          up  1.00000          1.00000
>  73  5.45999         osd.73          up  1.00000          1.00000
>  74  5.45999         osd.74          up  1.00000          1.00000
>  75  5.45999         osd.75          up  1.00000          1.00000
>  76  5.45999         osd.76          up  1.00000          1.00000
>  77  5.45999         osd.77          up  1.00000          1.00000
> -10 32.75999     host slpeah008
>  78  5.45999         osd.78          up  1.00000          1.00000
>  79  5.45999         osd.79          up  1.00000          1.00000
>  80  5.45999         osd.80          up  1.00000          1.00000
>  81  5.45999         osd.81          up  1.00000          1.00000
>  82  5.45999         osd.82          up  1.00000          1.00000
>  83  5.45999         osd.83          up  1.00000          1.00000
>  -3 14.56000     host slpeah001
>   1  3.64000          osd.1         down  1.00000          1.00000
>  33  3.64000         osd.33        down  1.00000          1.00000
>  34  3.64000         osd.34        down  1.00000          1.00000
>  35  3.64000         osd.35        down  1.00000          1.00000
>  -2 14.56000     host slpeah002
>   0  3.64000         osd.0           up  1.00000          1.00000
>  36  3.64000         osd.36          up  1.00000          1.00000
>  37  3.64000         osd.37          up  1.00000          1.00000
>  38  3.64000         osd.38          up  1.00000          1.00000
>
> Crushmap:
>
>  # begin crush map
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable straw_calc_version 1
> tunable allowed_bucket_algs 54
>
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 device2
> device 3 device3
> device 4 device4
> device 5 device5
> device 6 device6
> device 7 device7
> device 8 device8
> device 9 device9
> device 10 device10
> device 11 device11
> device 12 device12
> device 13 device13
> device 14 device14
> device 15 device15
> device 16 device16
> device 17 device17
> device 18 device18
> device 19 device19
> device 20 device20
> device 21 device21
> device 22 device22
> device 23 device23
> device 24 device24
> device 25 device25
> device 26 device26
> device 27 device27
> device 28 device28
> device 29 device29
> device 30 device30
> device 31 device31
> device 32 device32
> device 33 osd.33
> device 34 osd.34
> device 35 osd.35
> device 36 osd.36
> device 37 osd.37
> device 38 osd.38
> device 39 device39
> device 40 device40
> device 41 device41
> device 42 device42
> device 43 device43
> device 44 device44
> device 45 device45
> device 46 device46
> device 47 device47
> device 48 device48
> device 49 device49
> device 50 device50
> device 51 device51
> device 52 device52
> device 53 device53
> device 54 device54
> device 55 device55
> device 56 device56
> device 57 device57
> device 58 device58
> device 59 device59
> device 60 device60
> device 61 device61
> device 62 device62
> device 63 device63
> device 64 device64
> device 65 device65
> device 66 device66
> device 67 device67
> device 68 device68
> device 69 device69
> device 70 device70
> device 71 device71
> device 72 osd.72
> device 73 osd.73
> device 74 osd.74
> device 75 osd.75
> device 76 osd.76
> device 77 osd.77
> device 78 osd.78
> device 79 osd.79
> device 80 osd.80
> device 81 osd.81
> device 82 osd.82
> device 83 osd.83
>
> # types
> type 0 osd
> type 1 host
> type 2 chassis
> type 3 rack
> type 4 row
> type 5 pdu
> type 6 pod
> type 7 room
> type 8 datacenter
> type 9 region
> type 10 root
>
> # buckets
> host slpeah007 {
>         id -9           # do not change unnecessarily
>         # weight 32.760
>         alg straw
>         hash 0  # rjenkins1
>         item osd.72 weight 5.460
>         item osd.73 weight 5.460
>         item osd.74 weight 5.460
>         item osd.75 weight 5.460
>         item osd.76 weight 5.460
>         item osd.77 weight 5.460
> }
> host slpeah008 {
>         id -10          # do not change unnecessarily
>         # weight 32.760
>         alg straw
>         hash 0  # rjenkins1
>         item osd.78 weight 5.460
>         item osd.79 weight 5.460
>         item osd.80 weight 5.460
>         item osd.81 weight 5.460
>         item osd.82 weight 5.460
>         item osd.83 weight 5.460
> }
> host slpeah001 {
>         id -3           # do not change unnecessarily
>         # weight 14.560
>         alg straw
>         hash 0  # rjenkins1
>         item osd.1 weight 3.640
>         item osd.33 weight 3.640
>         item osd.34 weight 3.640
>         item osd.35 weight 3.640
> }
> host slpeah002 {
>         id -2           # do not change unnecessarily
>         # weight 14.560
>         alg straw
>         hash 0  # rjenkins1
>         item osd.0 weight 3.640
>         item osd.36 weight 3.640
>         item osd.37 weight 3.640
>         item osd.38 weight 3.640
> }
> root default {
>         id -1           # do not change unnecessarily
>         # weight 94.640
>         alg straw
>         hash 0  # rjenkins1
>         item slpeah007 weight 32.760
>         item slpeah008 weight 32.760
>         item slpeah001 weight 14.560
>         item slpeah002 weight 14.560
> }
>
> # rules
> rule default {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
>
> # end crush map
>
>
>
> This is odd because pools have size 3 and I have 3 hosts alive, so why
> it is saying that undersized pgs are present? It makes me feel like
> CRUSH is not working properly.
> There is not much data currently in cluster, something about 3TB and
> as you can see from osd tree - each host have minimum of 14TB disk
> space on OSDs.
> So I'm a bit stuck now...
> How can I find the source of trouble?
>
> Thanks in advance!
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Undersized pgs problem

Reply via email to