I'm leaving the response on the CRUSH rule for Gregory, but you have
another problem you're running into that is causing more of this data to
stay on this node than you intend.  While you `out` the OSD it is still
contributing to the Host's weight.  So the host is still set to receive
that amount of data and distribute it among the disks inside of it.  This
is the default behavior (even if you `destroy` the OSD) to minimize the
data movement for losing the disk and again for adding it back into the
cluster after you replace the device.  If you are really strapped for
space, though, then you might consider fully purging the OSD which will
reduce the Host weight to what the other OSDs are.  However if you do have
a problem in your CRUSH rule, then doing this won't change anything for you.

On Thu, Feb 14, 2019 at 11:15 PM hnuzhoulin2 <hnuzhoul...@gmail.com> wrote:

> Thanks. I read the your reply in
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg48717.html
> so using indep will do fewer data remap when osd failed.
> using firstn: 1, 2, 3, 4, 5 -> 1, 2, 4, 5, 6 , 60% data remap
> using indep :1, 2, 3, 4, 5 -> 1, 2, 6, 4, 5, 25% data remap
>
> am I right?
> if so, what recommend to do when a disk failed and the total available
> size of the rest disk in the machine is not enough(can not replace failed
> disk immediately). or I should reserve more available size in EC situation.
>
> On 02/14/2019 02:49,Gregory Farnum<gfar...@redhat.com>
> <gfar...@redhat.com> wrote:
>
> Your CRUSH rule for EC spools is forcing that behavior with the line
>
> step chooseleaf indep 1 type ctnr
>
> If you want different behavior, you’ll need a different crush rule.
>
> On Tue, Feb 12, 2019 at 5:18 PM hnuzhoulin2 <hnuzhoul...@gmail.com> wrote:
>
>> Hi, cephers
>>
>>
>> I am building a ceph EC cluster.when a disk is error,I out it.But its all
>> PGs remap to the osds in the same host,which I think they should remap to
>> other hosts in the same rack.
>> test process is:
>>
>> ceph osd pool create .rgw.buckets.data 8192 8192 erasure ISA-4-2
>> site1_sata_erasure_ruleset 400000000
>> ceph osd df tree|awk '{print $1" "$2" "$3" "$9" "$10}'> /tmp/1
>> /etc/init.d/ceph stop osd.2
>> ceph osd out 2
>> ceph osd df tree|awk '{print $1" "$2" "$3" "$9" "$10}'> /tmp/2
>> diff /tmp/1 /tmp/2 -y --suppress-common-lines
>>
>> 0 1.00000 1.00000 118 osd.0       | 0 1.00000 1.00000 126 osd.0
>> 1 1.00000 1.00000 123 osd.1       | 1 1.00000 1.00000 139 osd.1
>> 2 1.00000 1.00000 122 osd.2       | 2 1.00000 0 0 osd.2
>> 3 1.00000 1.00000 113 osd.3       | 3 1.00000 1.00000 131 osd.3
>> 4 1.00000 1.00000 122 osd.4       | 4 1.00000 1.00000 136 osd.4
>> 5 1.00000 1.00000 112 osd.5       | 5 1.00000 1.00000 127 osd.5
>> 6 1.00000 1.00000 114 osd.6       | 6 1.00000 1.00000 128 osd.6
>> 7 1.00000 1.00000 124 osd.7       | 7 1.00000 1.00000 136 osd.7
>> 8 1.00000 1.00000 95 osd.8       | 8 1.00000 1.00000 113 osd.8
>> 9 1.00000 1.00000 112 osd.9       | 9 1.00000 1.00000 119 osd.9
>> TOTAL 3073T 197G         | TOTAL 3065T 197G
>> MIN/MAX VAR: 0.84/26.56         | MIN/MAX VAR: 0.84/26.52
>>
>>
>> some config info: (detail configs see:
>> https://gist.github.com/hnuzhoulin/575883dbbcb04dff448eea3b9384c125)
>> jewel 10.2.11  filestore+rocksdb
>>
>> ceph osd erasure-code-profile get ISA-4-2
>> k=4
>> m=2
>> plugin=isa
>> ruleset-failure-domain=ctnr
>> ruleset-root=site1-sata
>> technique=reed_sol_van
>>
>> part of ceph.conf is:
>>
>> [global]
>> fsid = 1CAB340D-E551-474F-B21A-399AC0F10900
>> auth cluster required = cephx
>> auth service required = cephx
>> auth client required = cephx
>> pid file = /home/ceph/var/run/$name.pid
>> log file = /home/ceph/log/$cluster-$name.log
>> mon osd nearfull ratio = 0.85
>> mon osd full ratio = 0.95
>> admin socket = /home/ceph/var/run/$cluster-$name.asok
>> osd pool default size = 3
>> osd pool default min size = 1
>> osd objectstore = filestore
>> filestore merge threshold = -10
>>
>> [mon]
>> keyring = /home/ceph/var/lib/$type/$cluster-$id/keyring
>> mon data = /home/ceph/var/lib/$type/$cluster-$id
>> mon cluster log file = /home/ceph/log/$cluster.log
>> [osd]
>> keyring = /home/ceph/var/lib/$type/$cluster-$id/keyring
>> osd data = /home/ceph/var/lib/$type/$cluster-$id
>> osd journal = /home/ceph/var/lib/$type/$cluster-$id/journal
>> osd journal size = 10000
>> osd mkfs type = xfs
>> osd mount options xfs = rw,noatime,nodiratime,inode64,logbsize=256k
>> osd backfill full ratio = 0.92
>> osd failsafe full ratio = 0.95
>> osd failsafe nearfull ratio = 0.85
>> osd max backfills = 1
>> osd crush update on start = false
>> osd op thread timeout = 60
>> filestore split multiple = 8
>> filestore max sync interval = 15
>> filestore min sync interval = 5
>> [osd.0]
>> host = cld-osd1-56
>> addr = XXXXX
>> user = ceph
>> devs = /disk/link/osd-0/data
>> osd journal = /disk/link/osd-0/journal
>> …….
>> [osd.503]
>> host = cld-osd42-56
>> addr = 10.108.87.52
>> user = ceph
>> devs = /disk/link/osd-503/data
>> osd journal = /disk/link/osd-503/journal
>>
>>
>> crushmap is below:
>>
>> # begin crush map
>> tunable choose_local_tries 0
>> tunable choose_local_fallback_tries 0
>> tunable choose_total_tries 50
>> tunable chooseleaf_descend_once 1
>> tunable chooseleaf_vary_r 1
>> tunable straw_calc_version 1
>> tunable allowed_bucket_algs 54
>>
>> # devices
>> device 0 osd.0
>> device 1 osd.1
>> device 2 osd.2
>> 。。。
>> device 502 osd.502
>> device 503 osd.503
>>
>> # types
>> type 0 osd          # osd
>> type 1 ctnr         # sata/ssd group by node, -101~1xx/-201~2xx
>> type 2 media        # sata/ssd group by rack, -11~1x/-21~2x
>> type 3 mediagroup   # sata/ssd group by site, -5/-6
>> type 4 unit         # site, -2
>> type 5 root         # root, -1
>>
>> # buckets
>> ctnr cld-osd1-56-sata {
>> id -101              # do not change unnecessarily
>> # weight 10.000
>> alg straw2
>> hash 0               # rjenkins1
>> item osd.0 weight 1.000
>> item osd.1 weight 1.000
>> item osd.2 weight 1.000
>> item osd.3 weight 1.000
>> item osd.4 weight 1.000
>> item osd.5 weight 1.000
>> item osd.6 weight 1.000
>> item osd.7 weight 1.000
>> item osd.8 weight 1.000
>> item osd.9 weight 1.000
>> }
>> ctnr cld-osd1-56-ssd {
>> id -201              # do not change unnecessarily
>> # weight 2.000
>> alg straw2
>> hash 0               # rjenkins1
>> item osd.10 weight 1.000
>> item osd.11 weight 1.000
>> }
>> …..
>> ctnr cld-osd41-56-sata {
>> id -141              # do not change unnecessarily
>> # weight 10.000
>> alg straw2
>> hash 0               # rjenkins1
>> item osd.480 weight 1.000
>> item osd.481 weight 1.000
>> item osd.482 weight 1.000
>> item osd.483 weight 1.000
>> item osd.484 weight 1.000
>> item osd.485 weight 1.000
>> item osd.486 weight 1.000
>> item osd.487 weight 1.000
>> item osd.488 weight 1.000
>> item osd.489 weight 1.000
>> }
>> ctnr cld-osd41-56-ssd {
>> id -241              # do not change unnecessarily
>> # weight 2.000
>> alg straw2
>> hash 0               # rjenkins1
>> item osd.490 weight 1.000
>> item osd.491 weight 1.000
>> }
>> ctnr cld-osd42-56-sata {
>> id -142              # do not change unnecessarily
>> # weight 10.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd29-56-sata weight 10.000
>> item cld-osd30-56-sata weight 10.000
>> item cld-osd31-56-sata weight 10.000
>> item cld-osd32-56-sata weight 10.000
>> item cld-osd33-56-sata weight 10.000
>> item cld-osd34-56-sata weight 10.000
>> item cld-osd35-56-sata weight 10.000
>> }
>>
>>
>> media site1-rack1-sata {
>> id -11               # do not change unnecessarily
>> # weight 70.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd1-56-sata weight 10.000
>> item cld-osd2-56-sata weight 10.000
>> item cld-osd3-56-sata weight 10.000
>> item cld-osd4-56-sata weight 10.000
>> item cld-osd5-56-sata weight 10.000
>> item cld-osd6-56-sata weight 10.000
>> item cld-osd7-56-sata weight 10.000
>> }
>> media site1-rack2-sata {
>> id -12               # do not change unnecessarily
>> # weight 70.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd8-56-sata weight 10.000
>> item cld-osd9-56-sata weight 10.000
>> item cld-osd10-56-sata weight 10.000
>> item cld-osd11-56-sata weight 10.000
>> item cld-osd12-56-sata weight 10.000
>> item cld-osd13-56-sata weight 10.000
>> item cld-osd14-56-sata weight 10.000
>> }
>> media site1-rack3-sata {
>> id -13               # do not change unnecessarily
>> # weight 70.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd15-56-sata weight 10.000
>> item cld-osd16-56-sata weight 10.000
>> item cld-osd17-56-sata weight 10.000
>> item cld-osd18-56-sata weight 10.000
>> item cld-osd19-56-sata weight 10.000
>> item cld-osd20-56-sata weight 10.000
>> item cld-osd21-56-sata weight 10.000
>> }
>> media site1-rack4-sata {
>> id -14               # do not change unnecessarily
>> # weight 70.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd22-56-sata weight 10.000
>> item cld-osd23-56-sata weight 10.000
>> item cld-osd24-56-sata weight 10.000
>> item cld-osd25-56-sata weight 10.000
>> item cld-osd26-56-sata weight 10.000
>> item cld-osd27-56-sata weight 10.000
>> item cld-osd28-56-sata weight 10.000
>> }
>> media site1-rack5-sata {
>> id -15               # do not change unnecessarily
>> # weight 70.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd29-56-sata weight 10.000
>> item cld-osd30-56-sata weight 10.000
>> item cld-osd31-56-sata weight 10.000
>> item cld-osd32-56-sata weight 10.000
>> item cld-osd33-56-sata weight 10.000
>> item cld-osd34-56-sata weight 10.000
>> item cld-osd35-56-sata weight 10.000
>> }
>> media site1-rack6-sata {
>> id -16               # do not change unnecessarily
>> # weight 70.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd36-56-sata weight 10.000
>> item cld-osd37-56-sata weight 10.000
>> item cld-osd38-56-sata weight 10.000
>> item cld-osd39-56-sata weight 10.000
>> item cld-osd40-56-sata weight 10.000
>> item cld-osd41-56-sata weight 10.000
>> item cld-osd42-56-sata weight 10.000
>> }
>>
>> media site1-rack1-ssd {
>> id -21               # do not change unnecessarily
>> # weight 14.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd1-56-ssd weight 2.000
>> item cld-osd2-56-ssd weight 2.000
>> item cld-osd3-56-ssd weight 2.000
>> item cld-osd4-56-ssd weight 2.000
>> item cld-osd5-56-ssd weight 2.000
>> item cld-osd6-56-ssd weight 2.000
>> item cld-osd7-56-ssd weight 2.000
>> item cld-osd8-56-ssd weight 2.000
>> item cld-osd9-56-ssd weight 2.000
>> item cld-osd10-56-ssd weight 2.000
>> item cld-osd11-56-ssd weight 2.000
>> item cld-osd12-56-ssd weight 2.000
>> item cld-osd13-56-ssd weight 2.000
>> item cld-osd14-56-ssd weight 2.000
>> }
>> media site1-rack2-ssd {
>> id -22               # do not change unnecessarily
>> # weight 14.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd15-56-ssd weight 2.000
>> item cld-osd16-56-ssd weight 2.000
>> item cld-osd17-56-ssd weight 2.000
>> item cld-osd18-56-ssd weight 2.000
>> item cld-osd19-56-ssd weight 2.000
>> item cld-osd20-56-ssd weight 2.000
>> item cld-osd21-56-ssd weight 2.000
>> item cld-osd22-56-ssd weight 2.000
>> item cld-osd23-56-ssd weight 2.000
>> item cld-osd24-56-ssd weight 2.000
>> item cld-osd25-56-ssd weight 2.000
>> item cld-osd26-56-ssd weight 2.000
>> item cld-osd27-56-ssd weight 2.000
>> item cld-osd28-56-ssd weight 2.000
>> }
>> media site1-rack3-ssd {
>> id -23               # do not change unnecessarily
>> # weight 14.000
>> alg straw2
>> hash 0               # rjenkins1
>> item cld-osd29-56-ssd weight 2.000
>> item cld-osd30-56-ssd weight 2.000
>> item cld-osd31-56-ssd weight 2.000
>> item cld-osd32-56-ssd weight 2.000
>> item cld-osd33-56-ssd weight 2.000
>> item cld-osd34-56-ssd weight 2.000
>> item cld-osd35-56-ssd weight 2.000
>> item cld-osd36-56-ssd weight 2.000
>> item cld-osd37-56-ssd weight 2.000
>> item cld-osd38-56-ssd weight 2.000
>> item cld-osd39-56-ssd weight 2.000
>> item cld-osd40-56-ssd weight 2.000
>> item cld-osd41-56-ssd weight 2.000
>> item cld-osd42-56-ssd weight 2.000
>> }
>> mediagroup site1-sata {
>> id -5                # do not change unnecessarily
>> # weight 420.000
>> alg straw2
>> hash 0               # rjenkins1
>> item site1-rack1-sata weight 70.000
>> item site1-rack2-sata weight 70.000
>> item site1-rack3-sata weight 70.000
>> item site1-rack4-sata weight 70.000
>> item site1-rack5-sata weight 70.000
>> item site1-rack6-sata weight 70.000
>> }
>> mediagroup site1-ssd {
>> id -6                # do not change unnecessarily
>> # weight 84.000
>> alg straw2
>> hash 0               # rjenkins1
>> item site1-rack1-ssd weight 28.000
>> item site1-rack2-ssd weight 28.000
>> item site1-rack3-ssd weight 28.000
>> }
>>
>> unit site1 {
>> id -2                # do not change unnecessarily
>> # weight 504.000
>> alg straw2
>> hash 0               # rjenkins1
>> item site1-sata weight 420.000
>> item site1-ssd weight 84.000
>> }
>>
>> root default {
>> id -1                # do not change unnecessarily
>> # weight 504.000
>> alg straw2
>> hash 0               # rjenkins1
>> item site1 weight 504.000
>> }
>> # rules
>> rule site1_sata_erasure_ruleset {
>> ruleset 0
>> type erasure
>> min_size 3
>> max_size 6
>> step set_chooseleaf_tries 5
>> step set_choose_tries 100
>> step take site1-sata
>> step choose indep 0 type media
>> step chooseleaf indep 1 type ctnr
>> step emit
>> }
>> rule site1_ssd_replicated_ruleset {
>> ruleset 1
>> type replicated
>> min_size 1
>> max_size 10
>> step take site1-ssd
>> step choose firstn 0 type media
>> step chooseleaf firstn 1 type ctnr
>> step emit
>> }
>> # end crush map
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to