Hi,
With 5 hosts, I could successfully create pools with k=4 and m=1, with the
failure domain being set to "host".
With 6 hosts, I could also create k=4,m=1 EC pools.
But I suddenly failed with 6 hosts k=5 and m=1, or k=4,m=2 : the PGs were never
created - I reused the pool name for my tests, this seems to matter, see below-
??
HEALTH_WARN 512 pgs stuck inactive; 512 pgs stuck unclean
pg 159.70 is stuck inactive since forever, current state creating, last acting
[]
pg 159.71 is stuck inactive since forever, current state creating, last acting
[]
pg 159.72 is stuck inactive since forever, current state creating, last acting
[]
The pool is like this :
[root@ceph0 ~]# ceph osd pool get testec erasure_code_profile
erasure_code_profile: erasurep4_2_host
[root@ceph0 ~]# ceph osd erasure-code-profile get erasurep4_2_host
directory=/usr/lib64/ceph/erasure-code
k=4
m=2
plugin=isa
ruleset-failure-domain=host
The PG list is like this - all PGs are alike- :
pg_stat objects mip degr misp unf bytes log disklog state
state_stamp v reported up up_primary acting
acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
159.0 0 0 0 0 0 0 0 0
creating 0.000000 0'0 0:0 [] -1 [] -1
0'0 2015-09-30 14:41:01.219196 0'0 2015-09-30 14:41:01.219196
159.1 0 0 0 0 0 0 0 0
creating 0.000000 0'0 0:0 [] -1 [] -1
0'0 2015-09-30 14:41:01.219197 0'0 2015-09-30 14:41:01.219197
I can't dump a PG (but if it's on no OSD then...)
[root@ceph0 ~]# ceph pg 159.0 dump
^CError EINTR: problem getting command descriptions from pg.159.0
? Hangs.
The OSD tree is like this :
-1 21.71997 root default
-2 3.62000 host ceph4
9 1.81000 osd.9 up 1.00000 1.00000
15 1.81000 osd.15 up 1.00000 1.00000
-3 3.62000 host ceph0
5 1.81000 osd.5 up 1.00000 1.00000
11 1.81000 osd.11 up 1.00000 1.00000
-4 3.62000 host ceph1
6 1.81000 osd.6 up 1.00000 1.00000
12 1.81000 osd.12 up 1.00000 1.00000
-5 3.62000 host ceph2
7 1.81000 osd.7 up 1.00000 1.00000
13 1.81000 osd.13 up 1.00000 1.00000
-6 3.62000 host ceph3
8 1.81000 osd.8 up 1.00000 1.00000
14 1.81000 osd.14 up 1.00000 1.00000
-13 3.62000 host ceph5
10 1.81000 osd.10 up 1.00000 1.00000
16 1.81000 osd.16 up 1.00000 1.00000
Then, I dumped the crush ruleset and noticed the "max_size=5".
[root@ceph0 ~]# ceph osd pool get testec crush_ruleset
crush_ruleset: 1
[root@ceph0 ~]# ceph osd crush rule dump testec
{
"rule_id": 1,
"rule_name": "testec",
"ruleset": 1,
"type": 3,
"min_size": 3,
"max_size": 5,
I thought I should not care, since I'm not creating a replicated pool but...
I then deleted the pool + deleted the "testec" ruleset, re-created the pool
and... boom, PGs started being created !?
Now, the ruleset looks like this :
[root@ceph0 ~]# ceph osd crush rule dump testec
{
"rule_id": 1,
"rule_name": "testec",
"ruleset": 1,
"type": 3,
"min_size": 3,
"max_size": 6,
^^^
Is this a bug, or a "feature" (if so, I'd be glad if someone could shed some
light on it ?) ?
I'm presuming ceph is considering that an EC chunk is a replica, but I'm
failing to understand the documentation : I did not select the crush ruleset
when I created the pool.
Still, the ruleset was chosen by default (by CRUSH?) , and was not working... ?
Thanks && regards
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com