Hello,
I am using pacific 16.2.10 on Rocky 8.6 Linux.
After setting upmap_max_deviation to 1 on the ceph balancer in ceph-mgr, I
achieved a near perfect balance of PGs and space on my OSDs. This is great.
However, I started getting the following errors on my ceph-mon logs, every
three minutes, for each of the OSDs that had been mappedby the balancer:
2022-10-07T17:10:39.619+0000 7f7c2786d700 1 verify_upmap unable to get
parent of osd.497, skipping for now
After banging my head against the wall for a bit trying to figure this out, I
think I have discovered the issue:
Currently, I have my pool EC Pool configured with the following crush rule:
rule mypoolname {
id -5
type erasure
step take myroot
step choose indep 4 type rack
step choose indep 2 type pod
step chooseleaf indep 1 type host
step emit
}
Basically, pick 4 racks, then 2 pods in each rack, and then one host in each
pod, For a total of
8 chunks. (The pool is a is a 6+2). The 4 racks are chosen from the myroot root
entry, which is as follows.
root myroot {
id -400
item rack1 weight N
item rack2 weight N
item rack3 weight N
item rack4 weight N
}
This has worked fine since inception, over a year ago. And the PGs are all as I
expect with OSDs from the 4 racks and not on the same host or pod.
The errors above, verify_upmap, started after I had the upmap_ max_deviation
set to 1 in the balancer and having it
move things around, creating pg_upmap entries.
I then discovered, while trying to figure this out, that the device types are:
type 0 osd
type 1 host
type 2 chassis
type 3 rack
...
type 6 pod
So pod is HIGHER on the hierarchy than rack. I have it as lower on my rule.
What I want to do is remove the pods completely to work around this. Something
like:
rule mypoolname {
id -5
type erasure
step take myroot
step choose indep 4 type rack
step chooseleaf indep 2 type host
step emit
}
This will pick 4 racks and then 2 hosts in each rack. Will this cause any
problems? I can add the pod stuff back later as 'chassis' instead. I can live
without the 'pod' separation if needed.
To test this, I tried doing something like this:
1. grab the osdmap:
ceph osd getmap -o /tmp/om
2. pull out the crushmap:
osdmaptool --export-crush /tmp/crush.bin
3. cnvert it to text:
crushtool -d /tmp/crush.bin -o /tmp/crush.txt
I then edited the rule for this pool as above, to remove the pod and go directly
to pulling from 4 racks then 2 hosts in each rack. I then compiled up the crush
map
and then imported it into the extracted osdmap:
crushtool -c /tmp/crush.txt -o /tmp/crush.bin
osdmaptool /tmp/om --import-crush /tmp/crush.bin
I then ran upmap-cleanup on the new osdmap:
osdmaptool /tmp/om --upmap-cleanup
I did NOT get any of the verify_upmap messages (but it did generate some
rm-pg-upmap-items and some new upmaps in the list of commands to execute).
When I did the extraction of the osdmap WITHOUT any changes to it, and then ran
the upmap-cleanup, I got the same verify_upmap errors I am now
seeing in the ceph-mon logs.
So, should I just change the crushmap to remove the wrong rack->pod->host
hierarchy, making it rack->host ?
Will I have other issues? I am surprised that crush allowed me to create this
out of order rule to begin with.
Thanks for any suggestions.
-Chris
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]