Your crush rule is ok:
step chooseleaf firstn 0 type host
You are replicating host-wise, not rack wise.
This is what I would suggest for you cluster, but keep in mind that a
whole-rack outage will leave some PGs incomplete.
Regarding the straw2 change causing 12% data movement -- in this case
it is a bit more than I would have expected.
-- dan
On Mon, Jan 14, 2019 at 3:40 PM Massimo Sgaravatto
<[email protected]> wrote:
>
> Hi Dan
>
> I have indeed at the moment only 5 OSD nodes on 3 racks.
> The crush-map is attached.
> Are you suggesting to replicate only between nodes and not between racks
> (since the very few resources) ?
> Thanks, Massimo
>
> On Mon, Jan 14, 2019 at 3:29 PM Dan van der Ster <[email protected]> wrote:
>>
>> On Mon, Jan 14, 2019 at 3:18 PM Massimo Sgaravatto
>> <[email protected]> wrote:
>> >
>> > Thanks for the prompt reply
>> >
>> > Indeed I have different racks with different weights.
>>
>> Are you sure you're replicating across racks? You have only 3 racks,
>> one of which is half the size of the other two -- if yes, then your
>> cluster will be full once that rack is full.
>>
>> -- dan
>>
>>
>> > Below the ceph osd tree" output
>> >
>> > [root@ceph-mon-01 ~]# ceph osd tree
>> > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>> > -1 272.80426 root default
>> > -7 109.12170 rack Rack11-PianoAlto
>> > -8 54.56085 host ceph-osd-04
>> > 30 hdd 5.45609 osd.30 up 1.00000 1.00000
>> > 31 hdd 5.45609 osd.31 up 1.00000 1.00000
>> > 32 hdd 5.45609 osd.32 up 1.00000 1.00000
>> > 33 hdd 5.45609 osd.33 up 1.00000 1.00000
>> > 34 hdd 5.45609 osd.34 up 1.00000 1.00000
>> > 35 hdd 5.45609 osd.35 up 1.00000 1.00000
>> > 36 hdd 5.45609 osd.36 up 1.00000 1.00000
>> > 37 hdd 5.45609 osd.37 up 1.00000 1.00000
>> > 38 hdd 5.45609 osd.38 up 1.00000 1.00000
>> > 39 hdd 5.45609 osd.39 up 1.00000 1.00000
>> > -9 54.56085 host ceph-osd-05
>> > 40 hdd 5.45609 osd.40 up 1.00000 1.00000
>> > 41 hdd 5.45609 osd.41 up 1.00000 1.00000
>> > 42 hdd 5.45609 osd.42 up 1.00000 1.00000
>> > 43 hdd 5.45609 osd.43 up 1.00000 1.00000
>> > 44 hdd 5.45609 osd.44 up 1.00000 1.00000
>> > 45 hdd 5.45609 osd.45 up 1.00000 1.00000
>> > 46 hdd 5.45609 osd.46 up 1.00000 1.00000
>> > 47 hdd 5.45609 osd.47 up 1.00000 1.00000
>> > 48 hdd 5.45609 osd.48 up 1.00000 1.00000
>> > 49 hdd 5.45609 osd.49 up 1.00000 1.00000
>> > -6 109.12170 rack Rack15-PianoAlto
>> > -3 54.56085 host ceph-osd-02
>> > 10 hdd 5.45609 osd.10 up 1.00000 1.00000
>> > 11 hdd 5.45609 osd.11 up 1.00000 1.00000
>> > 12 hdd 5.45609 osd.12 up 1.00000 1.00000
>> > 13 hdd 5.45609 osd.13 up 1.00000 1.00000
>> > 14 hdd 5.45609 osd.14 up 1.00000 1.00000
>> > 15 hdd 5.45609 osd.15 up 1.00000 1.00000
>> > 16 hdd 5.45609 osd.16 up 1.00000 1.00000
>> > 17 hdd 5.45609 osd.17 up 1.00000 1.00000
>> > 18 hdd 5.45609 osd.18 up 1.00000 1.00000
>> > 19 hdd 5.45609 osd.19 up 1.00000 1.00000
>> > -4 54.56085 host ceph-osd-03
>> > 20 hdd 5.45609 osd.20 up 1.00000 1.00000
>> > 21 hdd 5.45609 osd.21 up 1.00000 1.00000
>> > 22 hdd 5.45609 osd.22 up 1.00000 1.00000
>> > 23 hdd 5.45609 osd.23 up 1.00000 1.00000
>> > 24 hdd 5.45609 osd.24 up 1.00000 1.00000
>> > 25 hdd 5.45609 osd.25 up 1.00000 1.00000
>> > 26 hdd 5.45609 osd.26 up 1.00000 1.00000
>> > 27 hdd 5.45609 osd.27 up 1.00000 1.00000
>> > 28 hdd 5.45609 osd.28 up 1.00000 1.00000
>> > 29 hdd 5.45609 osd.29 up 1.00000 1.00000
>> > -5 54.56085 rack Rack17-PianoAlto
>> > -2 54.56085 host ceph-osd-01
>> > 0 hdd 5.45609 osd.0 up 1.00000 1.00000
>> > 1 hdd 5.45609 osd.1 up 1.00000 1.00000
>> > 2 hdd 5.45609 osd.2 up 1.00000 1.00000
>> > 3 hdd 5.45609 osd.3 up 1.00000 1.00000
>> > 4 hdd 5.45609 osd.4 up 1.00000 1.00000
>> > 5 hdd 5.45609 osd.5 up 1.00000 1.00000
>> > 6 hdd 5.45609 osd.6 up 1.00000 1.00000
>> > 7 hdd 5.45609 osd.7 up 1.00000 1.00000
>> > 8 hdd 5.45609 osd.8 up 1.00000 1.00000
>> > 9 hdd 5.45609 osd.9 up 1.00000 1.00000
>> > [root@ceph-mon-01 ~]#
>> >
>> > On Mon, Jan 14, 2019 at 3:13 PM Dan van der Ster <[email protected]>
>> > wrote:
>> >>
>> >> On Mon, Jan 14, 2019 at 3:06 PM Massimo Sgaravatto
>> >> <[email protected]> wrote:
>> >> >
>> >> > I have a ceph luminous cluster running on CentOS7 nodes.
>> >> > This cluster has 50 OSDs, all with the same size and all with the same
>> >> > weight.
>> >> >
>> >> > Since I noticed that there was a quite "unfair" usage of OSD nodes
>> >> > (some used at 30 %, some used at 70 %) I tried to activate the balancer.
>> >> >
>> >> > But the balancer doesn't start I guess because of this problem:
>> >> >
>> >> > [root@ceph-mon-01 ~]# ceph osd crush weight-set create-compat
>> >> > Error EPERM: crush map contains one or more bucket(s) that are not
>> >> > straw2
>> >> >
>> >> >
>> >> > So I issued the command to convert from straw to straw2 (all the
>> >> > clients are running luminous):
>> >> >
>> >> >
>> >> > [root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2
>> >> > Error EINVAL: new crush map requires client version hammer but
>> >> > require_min_compat_client is firefly
>> >> > [root@ceph-mon-01 ~]# ceph osd set-require-min-compat-client jewel
>> >> > set require_min_compat_client to jewel
>> >> > [root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2
>> >> > [root@ceph-mon-01 ~]#
>> >> >
>> >> >
>> >> > After having issued the command, the cluster went in WARNING state
>> >> > because ~ 12 % objects were misplaced.
>> >> >
>> >> > Is this normal ?
>> >> > I read somewhere that the migration from straw to straw2 should trigger
>> >> > a data migration only if the OSDs have different sizes, which is not my
>> >> > case.
>> >>
>> >> The relevant sizes to compare are the crush buckets across which you
>> >> are replicating.
>> >> Are you replicating host-wise or rack-wise?
>> >> Do you have hosts/racks with a different crush weight (e.g. different
>> >> crush size).
>> >> Maybe share your `ceph osd tree`.
>> >>
>> >> Cheers, dan
>> >>
>> >>
>> >>
>> >> >
>> >> >
>> >> > The cluster is still recovering, but what is worrying me is that it
>> >> > looks like that data are being moved to the most used OSDs and the
>> >> > MAX_AVAIL value is decreasing quite quickly.
>> >> >
>> >> > I hope that the recovery can finish without causing problems: then I
>> >> > will immediately activate the balancer.
>> >> >
>> >> > But, if some OSDs are getting too full, is it safe to decrease their
>> >> > weights while the cluster is still being recovered ?
>> >> >
>> >> > Thanks a lot for your help
>> >> > Of course I can provide other info, if needed
>> >> >
>> >> >
>> >> > Cheers, Massimo
>> >> >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > [email protected]
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com