So I wanted to report a crush rule/ec profile strange behaviour regarding
radosgw items which i am not sure if it's a bug or it's supposed to work that
way.
I am trying to implement the below scenario in my home lab:
By default there is a "default" erasure-code-profile with the below settings:
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
technique=reed_sol_van
w=8
>From the above we see that it uses the root bucket. Now ofcourse you would
>want to create your own ec-profile with custom algorithm/crush buckets etc
Let's say for example we create two new ec profiles. One with specific
crush-root = ssd-performance2 and one with the crush-root=default (there are no
disks there according to ceph osd tree-> end of page)
ceph osd erasure-code-profile set test-ec crush-device-class=
crush-failure-domain=host crush-root=ssd-performance2
jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure
technique=reed_sol_van w=8
ceph osd erasure-code-profile set test-ec2 crush-device-class=
crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false
k=2 m=1 plugin=jerasure technique=reed_sol_van w=8
Now let's create the associated crush rules to use these profiles:
ceph osd crush rule create-erasure erasure-test-rule1 test-ec
ceph osd crush rule create-erasure erasure-test-rule2 test-ec2
Now let's say you have a radosgw server that has started and by default it
creates the 5 default radosgwpools(supposed you have uploaded some data as
well):
default.rgw.buckets.data
default.rgw.buckets.index
default.rgw.control
default.rgw.log
default.rgw.meta
Now if you grep these pools with ceph osd dump you will see that all of them
are using replicated rules but we want to use erasure for the radosgw data
pool. So let's migrate the default.rgw.buckets.data pool to a erasure-coded one.
1) We shutdown the radosgw-server so that we don't allow any requests coming in.
2) ceph osd pool rename default.rgw.buckets.data default.rgw.buckets.data-old
3) ceph osd pool create default.rgw.buckets.data 8 8 erasure test-ec
erasure-test-rule - > We use the newly created erasure crush rule with the
profile we created and use the ssd-performance2 root bucket
4) rados cppool default.rgw.buckets.data-old default.rgw.buckets.data
5) Start radosgw server again
At this point i can see the old objects and i can upload new objects in radosgw
and everything is working fine.
Now i see this strange behavior after i do the below:
We set the default.rgw.buckets.data to use the other erasure crush rule (This
is using the root bucket=default which doesn't have any disks):
ceph osd pool set default.rgw.buckets.data crush_rule erasure-test-rule2
Bug1? You could still browse the data but any attempt to upload/download hangs
there with the below log messages:
2019-12-18 17:07:07.037 7f05a1ece700 0 ERROR: client_io->complete_request()
returned Input/output error
2019-12-18 17:07:07.037 7f05a1ece700 2 req 712 0.004s s3:list_buckets op
status=
Monitor nodes don't display anything and seems that new items cannot be saved
(which is correct as it doesn't know where to save them) but at least Monitor
nodes should display something as a warning or there must be crush check
before to see if the rule can be applied?
Reverting back the rule to erasure-test-rule works fine again
=================================
Bug 2? If you modify the erasure-test-rule profile to use a null crush bucket
(like erasure-test-rule2) then this is not being parsed and identified by the
crush rule. Seems crush rules skips that part
Example:
ceph osd erasure-code-profile set test-ec crush-root=default --force
At this point nothing happens and radosgw is working fine. Which it shouldn't
as it should see that the data cannot be saved anywhere. Unless it keeps the
crush root bucket from the crush rules and not from the erasure coded
profiles...even if you force apply/change it to the erasure profile like above.
=================================
Bug 3? You don't know which rule is using which erasure-code-profile from ceph
osd dump. You only see that this pool is using crush rule number 1 but if you
dump this crush rule it doesn't mention which erasure-code profile is using,
other than which item_name eg = root bucket
Even with the telemetry on with latest release and if you do "ceph telemetry
show basic" with below you see there is no crush-root being mentioned.
So is the crush rule > erasure_code_profile regarding parsing of the crush_root
buckets?
{
"min_size": 2,
"erasure_code_profile": {
"crush-failure-domain": "host",
"k": "2",
"technique": "reed_sol_van",
"m": "1",
"plugin": "jerasure"
},
"pg_autoscale_mode": "warn",
"pool": 860,
"size": 3,
"cache_mode": "none",
"target_max_objects": 0,
"pg_num": 8,
"pgp_num": 8,
"target_max_bytes": 0,
"type": "erasure"
}
root@ceph-mon01:~# ceph osd crush rule dump erasure-test-rule
{
"rule_id": 2,
"rule_name": "erasure-test-rule",
"ruleset": 2,
"type": 3,
"min_size": 3,
"max_size": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -2,
"item_name": "ssd-performance2"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
root@ceph-mon01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-37 0.18398 root really-low
-40 0.09799 host ceph-osd01-really-low
11 hdd 0.09799 osd.11 up 1.00000 1.00000
-41 0.04799 host ceph-osd02-really-low
1 hdd 0.01900 osd.1 up 1.00000 1.00000
9 hdd 0.02899 osd.9 up 1.00000 1.00000
-42 0.03799 host ceph-osd03-really-low
6 hdd 0.01900 osd.6 up 1.00000 1.00000
7 hdd 0.01900 osd.7 up 1.00000 1.00000
-23 10.67598 root spinning-rust
-20 2.04900 rack rack1
-3 2.04900 host ceph-osd01
3 hdd 0.04900 osd.3 up 0.95001 1.00000
22 hdd 1.00000 osd.22 up 0.90002 1.00000
17 ssd 1.00000 osd.17 up 1.00000 1.00000
-25 3.07799 rack rack2
-5 3.07799 host ceph-osd02
4 hdd 0.04900 osd.4 up 1.00000 1.00000
8 hdd 0.02899 osd.8 up 1.00000 1.00000
23 hdd 1.00000 osd.23 up 1.00000 1.00000
25 hdd 1.00000 osd.25 up 1.00000 1.00000
12 ssd 1.00000 osd.12 up 1.00000 1.00000
-28 3.54900 rack rack3
-7 3.54900 host ceph-osd03
0 hdd 1.00000 osd.0 up 0.90002 1.00000
5 hdd 0.04900 osd.5 up 1.00000 1.00000
30 hdd 0.50000 osd.30 up 1.00000 1.00000
21 ssd 1.00000 osd.21 up 0.95001 1.00000
24 ssd 1.00000 osd.24 up 1.00000 1.00000
-55 2.00000 rack rack4
-49 2.00000 host ceph-osd04
26 hdd 1.00000 osd.26 up 1.00000 1.00000
27 hdd 1.00000 osd.27 up 1.00000 1.00000
-2 9.10799 root ssd-performance2
-32 2.09799 host ceph-osd01-ssd
2 ssd 0.09799 osd.2 up 1.00000 1.00000
13 ssd 1.00000 osd.13 up 1.00000 1.00000
16 ssd 1.00000 osd.16 up 1.00000 1.00000
-31 3.00000 host ceph-osd02-ssd
14 ssd 1.00000 osd.14 up 1.00000 1.00000
18 ssd 1.00000 osd.18 up 1.00000 1.00000
19 ssd 1.00000 osd.19 up 1.00000 1.00000
-9 2.00999 host ceph-osd03-ssd
10 ssd 0.00999 osd.10 up 0.90002 1.00000
15 ssd 1.00000 osd.15 up 1.00000 1.00000
20 ssd 1.00000 osd.20 up 1.00000 1.00000
-52 2.00000 host ceph-osd04-ssd
28 ssd 1.00000 osd.28 up 1.00000 1.00000
29 ssd 1.00000 osd.29 up 1.00000 1.00000
-1 0 root default
root@ceph-mon01:~#
Thanks,
Anastasios
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]