Hi Robert, Just wanted to let you know that after applying your crush suggestion and allowing cluster to rebalance itself, I now have symmetrical data distribution. In keeping 5 monitors my rationale is availability. I have 3 compute nodes + 2 storage nodes. I was thinking that making all of them a monitor would provide an additional backups. Based on your earlier comments, can you provide guidance on how much latency is induced by having excess monitors deployed?
Thanks. > On Jan 18, 2016, at 12:36 , Robert LeBlanc <[email protected]> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Not that I know of. > - ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Mon, Jan 18, 2016 at 10:33 AM, deeepdish wrote: >> Thanks Robert. Will definitely try this. Is there a way to implement >> “gradual CRUSH” changes? I noticed whenever cluster wide changes are >> pushed (crush map, for instance) the cluster immediately attempts to align >> itself disrupting client access / performance… >> >> >>> On Jan 18, 2016, at 12:22 , Robert LeBlanc wrote: >>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA256 >>> >>> I'm not sure why you have six monitors. Six monitors buys you nothing >>> over five monitors other than more power being used, and more latency >>> and more headache. See >>> http://docs.ceph.com/docs/hammer/rados/configuration/mon-config-ref/#monitor-quorum >>> >>> <http://docs.ceph.com/docs/hammer/rados/configuration/mon-config-ref/#monitor-quorum> >>> for some more info. Also, I'd consider 5 monitors overkill for this >>> size cluster, I'd recommend three. >>> >>> Although this is most likely not the root cause of your problem, you >>> probably have an error here: "root replicated-T1" is pointing to >>> b02s08 and b02s12 and "site erbus" is also pointing to b02s08 and >>> b02s12. You probably meant to have "root replicated-T1" pointing to >>> erbus instead. >>> >>> Where I think your problem is, is in your "rule replicated" section. >>> You can try: >>> step take replicated-T1 >>> step choose firstn 2 type host >>> step chooseleaf firstn 2 type osdgroup >>> step emit >>> >>> What this does is choose two hosts from the root replicated-T1 (which >>> happens to be both hosts you have), then chooses an OSD from two >>> osdgroups on each host. >>> >>> I believe the problem with your current rule set is that firstn 0 type >>> host tries to select four hosts, but only two are available. You >>> should be able to see that with 'ceph pg dump', where only two osds >>> will be listed in the up set. >>> >>> I hope that helps. >>> -----BEGIN PGP SIGNATURE----- >>> Version: Mailvelope v1.3.3 >>> Comment: https://www.mailvelope.com <https://www.mailvelope.com/> >>> >>> wsFcBAEBCAAQBQJWnR9kCRDmVDuy+mK58QAA5hUP/iJprG4nGR2sJvL//8l+ >>> V6oLYXTCs8lHeKL3ZPagThE9oh2xDMV37WR3I/xMNTA8735grl8/AAhy8ypW >>> MDOikbpzfWnlaL0SWs5rIQ5umATwv73Fg/Mf+K2Olt8IGP6D0NMIxfeOjU6E >>> 0Sc3F37nDQFuDEkBYjcVcqZC89PByh7yaId+eOgr7Ot+BZL/3fbpWIZ9kyD5 >>> KoPYdPjtFruoIpc8DJydzbWdmha65DkB65QOZlI3F3lMc6LGXUopm4OP4sQd >>> txVKFtTcLh97WgUshQMSWIiJiQT7+3D6EqQyPzlnei3O3gACpkpsmUteDPpn >>> p8CDeJtIpgKnQZjBwfK/bUQXdIGem8Y0x/PC+1ekIhkHCIJeW2sD3mFJduDQ >>> 9loQ9+IsWHfQmEHLMLdeNzRXbgBY2djxP2X70fXTg31fx+dYvbWeulYJHiKi >>> 1fJS4GdbPjoRUp5k4lthk3hDTFD/f5ZuowLDIaexgISb0bIJcObEn9RWlHut >>> IRVi0fUuRVIX3snGMOKjLmSUe87Od2KSEbULYPTLYDMo/FsWXWHNlP3gVKKd >>> lQJdxcwXOW7/v5oayY4wiEE6NF4rCupcqt0nPxxmbehmeRPxgkWCKJJs3FNr >>> VmUdnrdpfxzR5c8dmOELJnpNS6MTT56B8A4kKmqbbHCEKpZ83piG7uwqc+6f >>> RKkQ >>> =gp/0 >>> -----END PGP SIGNATURE----- >>> ---------------- >>> Robert LeBlanc >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>> >>> >>> On Sun, Jan 17, 2016 at 6:31 PM, deeepdish wrote: >>>> Hi Everyone, >>>> >>>> Looking for a double check of my logic and crush map.. >>>> >>>> Overview: >>>> >>>> - osdgroup bucket type defines failure domain within a host of 5 OSDs + 1 >>>> SSD. Therefore 5 OSDs (all utilizing the same journal) constitute an >>>> osdgroup bucket. Each host has 4 osdgroups. >>>> - 6 monitors >>>> - Two node cluster >>>> - Each node: >>>> - 20 OSDs >>>> - 4 SSDs >>>> - 4 osdgroups >>>> >>>> Desired Crush Rule outcome: >>>> - Assuming a pool with min_size=2 and size=4, all each node would contain a >>>> redundant copy of each object. Should any of the hosts fail, access to >>>> data would be uninterrupted. >>>> >>>> Current Crush Rule outcome: >>>> - There are 4 copies of each object, however I don’t believe each node has >>>> a >>>> redundant copy of each object, when a node fails, data is NOT accessible >>>> until ceph rebuilds itself / node becomes accessible again. >>>> >>>> I susepct my crush is not right, and to remedy it may take some time and >>>> cause cluster to be unresponsive / unavailable. Is there a way / method >>>> to apply substantial crush changes gradually to a cluster? >>>> >>>> Thanks for your help. >>>> >>>> >>>> Current crush map: >>>> >>>> # begin crush map >>>> tunable choose_local_tries 0 >>>> tunable choose_local_fallback_tries 0 >>>> tunable choose_total_tries 50 >>>> tunable chooseleaf_descend_once 1 >>>> tunable straw_calc_version 1 >>>> >>>> # devices >>>> device 0 osd.0 >>>> device 1 osd.1 >>>> device 2 osd.2 >>>> device 3 osd.3 >>>> device 4 osd.4 >>>> device 5 osd.5 >>>> device 6 osd.6 >>>> device 7 osd.7 >>>> device 8 osd.8 >>>> device 9 osd.9 >>>> device 10 osd.10 >>>> device 11 osd.11 >>>> device 12 osd.12 >>>> device 13 osd.13 >>>> device 14 osd.14 >>>> device 15 osd.15 >>>> device 16 osd.16 >>>> device 17 osd.17 >>>> device 18 osd.18 >>>> device 19 osd.19 >>>> device 20 osd.20 >>>> device 21 osd.21 >>>> device 22 osd.22 >>>> device 23 osd.23 >>>> device 24 osd.24 >>>> device 25 osd.25 >>>> device 26 osd.26 >>>> device 27 osd.27 >>>> device 28 osd.28 >>>> device 29 osd.29 >>>> device 30 osd.30 >>>> device 31 osd.31 >>>> device 32 osd.32 >>>> device 33 osd.33 >>>> device 34 osd.34 >>>> device 35 osd.35 >>>> device 36 osd.36 >>>> device 37 osd.37 >>>> device 38 osd.38 >>>> device 39 osd.39 >>>> >>>> # types >>>> type 0 osd >>>> type 1 osdgroup >>>> type 2 host >>>> type 3 rack >>>> type 4 site >>>> type 5 root >>>> >>>> # buckets >>>> osdgroup b02s08-osdgroupA { >>>> id -81 # do not change unnecessarily >>>> # weight 18.100 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.0 weight 3.620 >>>> item osd.1 weight 3.620 >>>> item osd.2 weight 3.620 >>>> item osd.3 weight 3.620 >>>> item osd.4 weight 3.620 >>>> } >>>> osdgroup b02s08-osdgroupB { >>>> id -82 # do not change unnecessarily >>>> # weight 18.100 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.5 weight 3.620 >>>> item osd.6 weight 3.620 >>>> item osd.7 weight 3.620 >>>> item osd.8 weight 3.620 >>>> item osd.9 weight 3.620 >>>> } >>>> osdgroup b02s08-osdgroupC { >>>> id -83 # do not change unnecessarily >>>> # weight 19.920 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.10 weight 3.620 >>>> item osd.11 weight 3.620 >>>> item osd.12 weight 3.620 >>>> item osd.13 weight 3.620 >>>> item osd.14 weight 5.440 >>>> } >>>> osdgroup b02s08-osdgroupD { >>>> id -84 # do not change unnecessarily >>>> # weight 19.920 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.15 weight 3.620 >>>> item osd.16 weight 3.620 >>>> item osd.17 weight 3.620 >>>> item osd.18 weight 3.620 >>>> item osd.19 weight 5.440 >>>> } >>>> host b02s08 { >>>> id -80 # do not change unnecessarily >>>> # weight 76.040 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item b02s08-osdgroupA weight 18.100 >>>> item b02s08-osdgroupB weight 18.100 >>>> item b02s08-osdgroupC weight 19.920 >>>> item b02s08-osdgroupD weight 19.920 >>>> } >>>> osdgroup b02s12-osdgroupA { >>>> id -121 # do not change unnecessarily >>>> # weight 18.100 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.20 weight 3.620 >>>> item osd.21 weight 3.620 >>>> item osd.22 weight 3.620 >>>> item osd.23 weight 3.620 >>>> item osd.24 weight 3.620 >>>> } >>>> osdgroup b02s12-osdgroupB { >>>> id -122 # do not change unnecessarily >>>> # weight 18.100 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.25 weight 3.620 >>>> item osd.26 weight 3.620 >>>> item osd.27 weight 3.620 >>>> item osd.28 weight 3.620 >>>> item osd.29 weight 3.620 >>>> } >>>> osdgroup b02s12-osdgroupC { >>>> id -123 # do not change unnecessarily >>>> # weight 19.920 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.30 weight 3.620 >>>> item osd.31 weight 3.620 >>>> item osd.32 weight 3.620 >>>> item osd.33 weight 3.620 >>>> item osd.34 weight 5.440 >>>> } >>>> osdgroup b02s12-osdgroupD { >>>> id -124 # do not change unnecessarily >>>> # weight 19.920 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.35 weight 3.620 >>>> item osd.36 weight 3.620 >>>> item osd.37 weight 3.620 >>>> item osd.38 weight 3.620 >>>> item osd.39 weight 5.440 >>>> } >>>> host b02s12 { >>>> id -120 # do not change unnecessarily >>>> # weight 76.040 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item b02s12-osdgroupA weight 18.100 >>>> item b02s12-osdgroupB weight 18.100 >>>> item b02s12-osdgroupC weight 19.920 >>>> item b02s12-osdgroupD weight 19.920 >>>> } >>>> root replicated-T1 { >>>> id -1 # do not change unnecessarily >>>> # weight 152.080 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item b02s08 weight 76.040 >>>> item b02s12 weight 76.040 >>>> } >>>> rack b02 { >>>> id -20 # do not change unnecessarily >>>> # weight 152.080 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item b02s08 weight 76.040 >>>> item b02s12 weight 76.040 >>>> } >>>> site erbus { >>>> id -10 # do not change unnecessarily >>>> # weight 152.080 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item b02 weight 152.080 >>>> } >>>> >>>> # rules >>>> rule replicated { >>>> ruleset 0 >>>> type replicated >>>> min_size 1 >>>> max_size 10 >>>> step take replicated-T1 >>>> step choose firstn 0 type host >>>> step chooseleaf firstn 0 type osdgroup >>>> step emit >>>> } >>>> >>>> # end crush map >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> [email protected] >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> > > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.3.3 > Comment: https://www.mailvelope.com <https://www.mailvelope.com/> > > wsFcBAEBCAAQBQJWnSKNCRDmVDuy+mK58QAATgIQAIHVBvSoQ2pQ6/J/+KI6 > 5TfjqAhJ3Q7E9JwC0suZ9JRORhBcrbab5wnY4oMabLeAEazTST5gAedeMBV4 > vFA1aG5RBpUcir1+49BZYpHuUuJuvviTSrVjojbr6eISvsJfFwq7BosQZw7h > DxExk8Pm5l8cDXd2z03f34F7xDfX3u0UsLm/TCTfzxFAmwngkC6rsElJSXp+ > MQ9mjBIncx+HkDWyshJjKBqhhXVfOa+euUuCcmTlIiGgIaA5PXNG+q+OJMnq > 0ONb9TF51ApW3NvIgRMKo94g+rw7wSEJbe7LkzJOgkJ19rrLrit3uhCOLDha > iF2ELgd9jNRbsODUd0iTU9DgecoWuCZMsCdpeYyoN+BO1OLAdNjgQH9JaHnx > JeIT538/x8gSi8S7We0FjdPvY0dbIMneROocK8/e+byboindodfV0z2YJ4C3 > kEkweIW+45PHjQWLPU6SVYtdZyHoUPOrCpEOTo/9uOILx8nmvcY2SEhyuiFd > 6QfZKKCwPmhNDNB+UUPzrd8cp794RDp9bYue3Ql5L1K4Yln0nXEvzVoNp8eB > JXjsFXkPMBB8njiS7E4e7CGc64azVlagGZ+H99jbCLVdyaTrT+9+/WGwl1Ut > OzDhwuU/dPYodT2ULPYtrMN03LoKozd2MKh0wjebwONJOUgMxUvGLcffSmMb > /TJ6 > =tLy3 > -----END PGP SIGNATURE-----
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
