Re: Crushmap Design Question

Wido den Hollander Wed, 09 Jan 2013 00:59:41 -0800

Hi,

On 01/09/2013 01:53 AM, Chen, Xiaoxi wrote:
> Hi，
>       Setting rep size to 3 only make the data triple-replication, that means 
> when you "fail" all OSDs in 2 out of 3 DCs, the data still accessable.
>       But Monitor is another story, for monitor clusters with 2N+1 nodes, it 
> require at least N+1 nodes alive, and indeed this is why you Ceph failed.
>       It looks to me this discipline make it hard to design a proper 
> deployment which is robust in DC outage. But hoping for inputs from 
> community,how to make Monitor cluster reliable.
>


>From what I understand he didn't kill the second mon, still leaving 2
out of 3 mons running.

Could you check if your PGs are actually mapped to OSDs spread out over
the 3 DCs?

"ceph pg dump" should tell you to which OSDs the PGs are mapped.

I've never tried before, but you don't have equal weights for the
datacenters, I don't know how that effects the situation.

Wido

>                                                                               
>                                                                               
>                                                                   Xiaoxi
> 
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Moore, Shawn M
> Sent: 2013年1月9日 4:21
> To: [email protected]
> Subject: Crushmap Design Question
> 
> I have been testing ceph for a little over a month now.  Our design goal is 
> to have 3 datacenters in different buildings all tied together over 10GbE.  
> Currently there are 10 servers each serving 1 osd in 2 of the datacenters.  
> In the third is one large server with 16 SAS disks serving 8 osds.  
> Eventually we will add one more identical large server into the third 
> datacenter.  I have told ceph to keep 3 copies and tried to do the crushmap 
> in such a way that as long as a majority of mon's can stay up, we could run 
> off of one datacenter's worth of osds.   So in my testing, it doesn't work 
> out quite this way...
> 
> Everything is currently ceph version 0.56.1 
> (e4a541624df62ef353e754391cbbb707f54b16f7)
> 
> I will put hopefully relevant files at the end of this email.
> 
> When all 28 osds are up, I get:
> 2013-01-08 13:56:07.435914 mon.0 [INF] pgmap v2712076: 7104 pgs: 7104 
> active+clean; 60264 MB data, 137 GB used, 13570 GB / 14146 GB avail
> 
> When I fail a datacenter (including 1 of 3 mon's) I eventually get:
> 2013-01-08 13:58:54.020477 mon.0 [INF] pgmap v2712139: 7104 pgs: 7104 
> active+degraded; 60264 MB data, 137 GB used, 13570 GB / 14146 GB avail; 
> 16362/49086 degraded (33.333%)
> 
> At this point everything is still ok.  But when I fail the 2nd datacenter 
> (still leaving 2 out of 3 mons running) I get:
> 2013-01-08 14:01:25.600056 mon.0 [INF] pgmap v2712189: 7104 pgs: 7104 
> incomplete; 60264 MB data, 137 GB used, 13570 GB / 14146 GB avail
> 
> Most VM's quit working and "rbd ls" works, but not a single line from "rados 
> -p rbd ls" works and the command hangs.  Now after a while (you can see from 
> timestamps) I end up at and stays this way:
> 2013-01-08 14:40:54.030370 mon.0 [INF] pgmap v2713794: 7104 pgs: 213 active, 
> 117 active+remapped, 3660 incomplete, 3108 active+degraded+remapped, 6 
> remapped+incomplete; 60264 MB data, 65701 MB used, 4604 GB / 4768 GB avail; 
> 7696/49086 degraded (15.679%)
> 
> I'm hoping I've done something wrong, so please advise.  Below are my 
> configs.  If you need something more to help, just ask.
> 
> Normal output with all datacenters up.
> # ceph osd tree
> # id  weight  type name       up/down reweight
> -1    80      root default
> -3    36              datacenter hok
> -2    1                       host blade151
> 0     1                               osd.0   up      1       
> -4    1                       host blade152
> 1     1                               osd.1   up      1       
> -15   1                       host blade153
> 2     1                               osd.2   up      1       
> -17   1                       host blade154
> 3     1                               osd.3   up      1       
> -18   1                       host blade155
> 4     1                               osd.4   up      1       
> -19   1                       host blade159
> 5     1                               osd.5   up      1       
> -20   1                       host blade160
> 6     1                               osd.6   up      1       
> -21   1                       host blade161
> 7     1                               osd.7   up      1       
> -22   1                       host blade162
> 8     1                               osd.8   up      1       
> -23   1                       host blade163
> 9     1                               osd.9   up      1       
> -24   36              datacenter csc
> -5    1                       host admbc0-01
> 10    1                               osd.10  up      1       
> -6    1                       host admbc0-02
> 11    1                               osd.11  up      1       
> -7    1                       host admbc0-03
> 12    1                               osd.12  up      1       
> -8    1                       host admbc0-04
> 13    1                               osd.13  up      1       
> -9    1                       host admbc0-05
> 14    1                               osd.14  up      1       
> -10   1                       host admbc0-06
> 15    1                               osd.15  up      1       
> -11   1                       host admbc0-09
> 16    1                               osd.16  up      1       
> -12   1                       host admbc0-10
> 17    1                               osd.17  up      1       
> -13   1                       host admbc0-11
> 18    1                               osd.18  up      1       
> -14   1                       host admbc0-12
> 19    1                               osd.19  up      1       
> -25   8               datacenter adm
> -16   8                       host admdisk0
> 20    1                               osd.20  up      1       
> 21    1                               osd.21  up      1       
> 22    1                               osd.22  up      1       
> 23    1                               osd.23  up      1       
> 24    1                               osd.24  up      1       
> 25    1                               osd.25  up      1       
> 26    1                               osd.26  up      1       
> 27    1                               osd.27  up      1
> 
> 
> 
> Showing copes set to 3.
> # ceph osd dump | grep " size "
> pool 0 'data' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 2368 
> pgp_num 2368 last_change 63 owner 0 crash_replay_interval 45 pool 1 
> 'metadata' rep size 3 crush_ruleset 1 object_hash rjenkins pg_num 2368 
> pgp_num 2368 last_change 65 owner 0 pool 2 'rbd' rep size 3 crush_ruleset 2 
> object_hash rjenkins pg_num 2368 pgp_num 2368 last_change 6061 owner 0
> 
> 
> 
> 
> Crushmap
> # begin crush map
> 
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
> device 9 osd.9
> device 10 osd.10
> device 11 osd.11
> device 12 osd.12
> device 13 osd.13
> device 14 osd.14
> device 15 osd.15
> device 16 osd.16
> device 17 osd.17
> device 18 osd.18
> device 19 osd.19
> device 20 osd.20
> device 21 osd.21
> device 22 osd.22
> device 23 osd.23
> device 24 osd.24
> device 25 osd.25
> device 26 osd.26
> device 27 osd.27
> 
> # types
> type 0 osd
> type 1 host
> type 2 rack
> type 3 row
> type 4 room
> type 5 datacenter
> type 6 root
> 
> # buckets
> host blade151 {
>       id -2           # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.0 weight 1.000
> }
> host blade152 {
>       id -4           # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.1 weight 1.000
> }
> host blade153 {
>       id -15          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.2 weight 1.000
> }
> host blade154 {
>       id -17          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.3 weight 1.000
> }
> host blade155 {
>       id -18          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.4 weight 1.000
> }
> host blade159 {
>       id -19          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.5 weight 1.000
> }
> host blade160 {
>       id -20          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.6 weight 1.000
> }
> host blade161 {
>       id -21          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.7 weight 1.000
> }
> host blade162 {
>       id -22          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.8 weight 1.000
> }
> host blade163 {
>       id -23          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.9 weight 1.000
> }
> datacenter hok {
>       id -3           # do not change unnecessarily
>       # weight 10.000
>       alg straw
>       hash 0  # rjenkins1
>       item blade151 weight 1.000
>       item blade152 weight 1.000
>       item blade153 weight 1.000
>       item blade154 weight 1.000
>       item blade155 weight 1.000
>       item blade159 weight 1.000
>       item blade160 weight 1.000
>       item blade161 weight 1.000
>       item blade162 weight 1.000
>       item blade163 weight 1.000
> }
> host admbc0-01 {
>       id -5           # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.10 weight 1.000
> }
> host admbc0-02 {
>       id -6           # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.11 weight 1.000
> }
> host admbc0-03 {
>       id -7           # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.12 weight 1.000
> }
> host admbc0-04 {
>       id -8           # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.13 weight 1.000
> }
> host admbc0-05 {
>       id -9           # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.14 weight 1.000
> }
> host admbc0-06 {
>       id -10          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.15 weight 1.000
> }
> host admbc0-09 {
>       id -11          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.16 weight 1.000
> }
> host admbc0-10 {
>       id -12          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.17 weight 1.000
> }
> host admbc0-11 {
>       id -13          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.18 weight 1.000
> }
> host admbc0-12 {
>       id -14          # do not change unnecessarily
>       # weight 1.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.19 weight 1.000
> }
> datacenter csc {
>       id -24          # do not change unnecessarily
>       # weight 10.000
>       alg straw
>       hash 0  # rjenkins1
>       item admbc0-01 weight 1.000
>       item admbc0-02 weight 1.000
>       item admbc0-03 weight 1.000
>       item admbc0-04 weight 1.000
>       item admbc0-05 weight 1.000
>       item admbc0-06 weight 1.000
>       item admbc0-09 weight 1.000
>       item admbc0-10 weight 1.000
>       item admbc0-11 weight 1.000
>       item admbc0-12 weight 1.000
> }
> host admdisk0 {
>       id -16          # do not change unnecessarily
>       # weight 8.000
>       alg straw
>       hash 0  # rjenkins1
>       item osd.20 weight 1.000
>       item osd.21 weight 1.000
>       item osd.22 weight 1.000
>       item osd.23 weight 1.000
>       item osd.24 weight 1.000
>       item osd.25 weight 1.000
>       item osd.26 weight 1.000
>       item osd.27 weight 1.000
> }
> datacenter adm {
>       id -25          # do not change unnecessarily
>       # weight 8.000
>       alg straw
>       hash 0  # rjenkins1
>       item admdisk0 weight 8.000
> }
> root default {
>       id -1           # do not change unnecessarily
>       # weight 80.000
>       alg straw
>       hash 0  # rjenkins1
>       item hok weight 36.000
>       item csc weight 36.000
>       item adm weight 8.000
> }
> 
> # rules
> rule data {
>       ruleset 0
>       type replicated
>       min_size 1
>       max_size 10
>       step take default
>       step chooseleaf firstn 0 type datacenter
>       step emit
> }
> rule metadata {
>       ruleset 1
>       type replicated
>       min_size 1
>       max_size 10
>       step take default
>       step chooseleaf firstn 0 type datacenter
>       step emit
> }
> rule rbd {
>       ruleset 2
>       type replicated
>       min_size 1
>       max_size 10
>       step take default
>       step chooseleaf firstn 0 type datacenter
>       step emit
> }
> 
> # end crush map
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
> body of a message to [email protected] More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> N�Р骒r��y����b�X�肚�v�^�)藓{.n�+���z�]z鳐�{ay������,j��f＂�h���z��wア�
> ⒎�j:+v���w�j�m������赙zZ+�����茛j"��!tml=
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Crushmap Design Question

Reply via email to