Hello all,
I am testing cluster with mixed type OSD on same data node (yes, it's the idea
from:
http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/),
and run into a strange status:
ceph -s or ceph pg dump shows incorrect PG information after set pg_num to pool
which is using different ruleset to select faster OSD.
Please advise what's wrong and if I can fix the issue without recreate new pool
with final pg_num directly:
Soe more detail:
1) update crushmap to have different root & ruleset to select different OSDs
like this:
rule replicated_ruleset_ssd {
ruleset 50
type replicated
min_size 1
max_size 10
step take sdd
step chooseleaf firstn 0 type host
step emit
}
2) create new pool and set crush_ruleset to use this new rule
$ ceph osd pool create ssd 64 64 replicated replicated_ruleset_ssd
(however after this command it's still using default ruleset 0)
$ ceph osd pool set ssd crush_ruleset 50
3) it looks good now:
$ ceph osd dump | grep pool
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45
stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 256 pgp_num 256 last_change 50 flags hashpspool stripe_width 0
pool 8 'xfs' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 1024 pgp_num 1024 last_change 1570 flags hashpspool stripe_width 0
pool 9 'ssd' replicated size 3 min_size 2 crush_ruleset 50 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1574 flags hashpspool stripe_width 0
$ ceph -s
cluster 5f8ae2a8-f143-42d9-b50d-246ac0874569
health HEALTH_OK
monmap e2: 3 mons at
{DEV-rhel7-vildn1=10.0.2.156:6789/0,DEV-rhel7-vildn2=10.0.2.157:6789/0,DEV-rhel7-vildn3=10.0.2.158:6789/0},
election epoch 84, quorum 0,1,2
DEV-rhel7-vildn1,DEV-rhel7-vildn2,DEV-rhel7-vildn3
osdmap e1578: 21 osds: 15 up, 15 in
pgmap v560681: 1472 pgs, 5 pools, 285 GB data, 73352 objects
80151 MB used, 695 GB / 779 GB avail
1472 active+clean
4) increase pg_num & pgp_num but total PG number is still 1472 in ceph -s:
$ ceph osd pool set ssd pg_num 128
set pool 9 pg_num to 128
$ ceph osd pool set ssd pgp_num 128
set pool 9 pgp_num to 128
$ ceph osd dump | grep pool
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45
stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 256 pgp_num 256 last_change 50 flags hashpspool stripe_width 0
pool 8 'xfs' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 1024 pgp_num 1024 last_change 1570 flags hashpspool stripe_width 0
pool 9 'ssd' replicated size 3 min_size 2 crush_ruleset 50 object_hash rjenkins
pg_num 128 pgp_num 128 last_change 1581 flags hashpspool stripe_width 0
$ ceph -s
cluster 5f8ae2a8-f143-42d9-b50d-246ac0874569
health HEALTH_OK
monmap e2: 3 mons at
{DEV-rhel7-vildn1=10.0.2.156:6789/0,DEV-rhel7-vildn2=10.0.2.157:6789/0,DEV-rhel7-vildn3=10.0.2.158:6789/0},
election epoch 84, quorum 0,1,2
DEV-rhel7-vildn1,DEV-rhel7-vildn2,DEV-rhel7-vildn3
osdmap e1582: 21 osds: 15 up, 15 in
pgmap v560709: 1472 pgs, 5 pools, 285 GB data, 73352 objects
80158 MB used, 695 GB / 779 GB avail
1472 active+clean
5) same problem with pg dump:
$ ceph pg dump | grep '^9\.' | wc
dumped all in format plain
64 1472 10288
6) looks pg are created under /var/lib/ceph/osd/ceph-<osd>/current folder:
$ ls -ld /var/lib/ceph/osd/ceph-15/current/9.* | wc
74 666 6133
]$ ls -ld /var/lib/ceph/osd/ceph-16/current/9.* | wc
54 486 4475
6 osd for this ruleset => 128 * 3 / 6 ~= 64
Thanks a lot
BR,
Luke Kao
MYCOM-OSI
________________________________
This electronic message contains information from Mycom which may be privileged
or confidential. The information is intended to be for the use of the
individual(s) or entity named above. If you are not the intended recipient, be
aware that any disclosure, copying, distribution or any other use of the
contents of this information is prohibited. If you have received this
electronic message in error, please notify us by post or telephone (to the
numbers or correspondence address above) or by email (at the email address
above) immediately.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com