Good evening,
for some time we have the problem that ceph stores too much data on
a host with small disks. Originally we used weight 1 = 1 TB, but
we reduced the weight for this particular host further to keep it
somehow alive.
Our setup currently consists of 3 hosts:
wein: 6x 136G (fest disks)
kaffee: 1x 5.5T (slow disk)
tee: 1x 5.5T (slow disk)
We originally started with 6 osds on wein with a weight of 0.13, but
had to reduce it to 0.05, because the disks were running full.
The current tree looks as following:
root@wein:~# ceph osd tree
# id weight type name up/down reweight
-1 2.3 root default
-2 0.2999 host wein
0 0.04999 osd.0 up 1
3 0.04999 osd.3 up 1
4 0.04999 osd.4 up 1
5 0.04999 osd.5 up 1
6 0.04999 osd.6 up 1
7 0.04999 osd.7 up 1
-3 1 host tee
1 5.5 osd.1 up 1
-4 1 host kaffee
2 5.5 osd.2 up 1
The hosts have the following disk usage:
root@wein:~# df -h | grep ceph
/dev/sdc1 136G 58G 79G 43% /var/lib/ceph/osd/ceph-0
/dev/sdd1 136G 54G 83G 40% /var/lib/ceph/osd/ceph-3
/dev/sde1 136G 31G 105G 23% /var/lib/ceph/osd/ceph-4
/dev/sdf1 136G 62G 75G 46% /var/lib/ceph/osd/ceph-5
/dev/sdg1 136G 45G 92G 33% /var/lib/ceph/osd/ceph-6
/dev/sdh1 136G 28G 109G 21% /var/lib/ceph/osd/ceph-7
root@kaffee:~# df -h | grep ceph
/dev/sdc 5.5T 983G 4.5T 18% /var/lib/ceph/osd/ceph-2
root@tee:~# df -h | grep ceph
/dev/sdb 5.5T 967G 4.6T 18% /var/lib/ceph/osd/ceph-1
On wein 48G are stored on average per osd, tee/kaffee store 952G on average.
(58+64+31+62+45+28)/6 = 48.0
(967+938)/2 = 952.5
The weight relation from wein osd to kaffee/tee osd is
5.5/0.05 = 110.0
The usage relation from wein osd to kaffee/tee osd is
(967+938)/2) = 952.5
952.5/48 = 19.84375
So ceph is allocating 5.5 times more storage to wein osds than
what we want it do:
110/(952.5/48) = 5.543307086614173
We are also a bit puzzled that the host weight for wein is 0.3 and
tee/kaffee is 1. So wein is the sum of the OSDs, but kaffee and tee it is not.
However looking at the crushmap, the host weight is being displayed as 5.5!
Has anyone an idea what may be going wrong here?
While writing this I noted that the relation / factor is exactly 5.5 times
wrong, so I *guess* that ceph treats all hosts with the same weight (even though
it looks differently to me in the osd tree and the crushmap)?
You find our crushmap attached below.
Cheers,
Nico
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host wein {
id -2 # do not change unnecessarily
# weight 0.300
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.050
item osd.3 weight 0.050
item osd.4 weight 0.050
item osd.5 weight 0.050
item osd.6 weight 0.050
item osd.7 weight 0.050
}
host tee {
id -3 # do not change unnecessarily
# weight 5.500
alg straw
hash 0 # rjenkins1
item osd.1 weight 5.500
}
host kaffee {
id -4 # do not change unnecessarily
# weight 5.500
alg straw
hash 0 # rjenkins1
item osd.2 weight 5.500
}
root default {
id -1 # do not change unnecessarily
# weight 2.300
alg straw
hash 0 # rjenkins1
item wein weight 0.300
item tee weight 1.000
item kaffee weight 1.000
}
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
--
New PGP key: 659B 0D91 E86E 7E24 FD15 69D0 C729 21A1 293F 2D24
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com