You are correct the PG are stale ( not allocated )
[root@stratonode1 /]# ceph status
cluster:
id: ea0df043-7b25-4447-a43d-e9b2af8fe069
health: HEALTH_WARN
Reduced data availability: 256 pgs inactive, 256 pgs peering,
256 pgs stale
services:
mon: 3 daemons, quorum
stratonode1.node.strato,stratonode2.node.strato,stratonode0.node.strato
mgr: stratonode1(active), standbys: stratonode2, stratonode3
osd: 4 osds: 4 up, 4 in
data:
pools: 1 pools, 256 pgs
objects: 0 objects, 0 bytes
usage: 4192 MB used, 9310 GB / 9315 GB avail
pgs: 100.000% pgs not active
256 stale+peering
PG dump : show all PG in stale + peering
However it s kind of strange it show some PG associated with OSD 3
So it seems that PGcalc is not taking into account the ruleset .....
Do you think that changing ""osd max pg per osd hard ratio"" to a huge
number (1M) would be a valid temp workaround ?
We always allocate pool with dedicated OSD using the device class rule set
, so we never have pool sharing OSD .
I ll open a bug with ceph regarding pg creation check ignoring the crush
ruleset.
On Thu, 26 Jul 2018 at 17:11, John Spray <[email protected]> wrote:
> On Thu, Jul 26, 2018 at 4:57 PM Benoit Hudzia <[email protected]>
> wrote:
>
>> HI,
>>
>> We currently segregate ceph pool PG allocation using the crush device
>> class ruleset as described:
>> https://ceph.com/community/new-luminous-crush-device-classes/
>> simply using the following command to define the rule : ceph osd crush
>> rule create-replicated <RULE> default host <DEVICE CLASS>
>>
>> However, we noticed that the rule is not strict in certain scenarios. By
>> that, I mean that if there is no OSD of the specific device class ceph will
>> allocate PG for this pool to any other OSD available ( creating an
>> issue with the PG calculation when we want to add new pool)
>>
>> Simple scenario :
>> 1. create 1 Pool : <pool1> , replication 2 with 4 nodes , 1 OSD each
>> . belonging to class <pool1>
>> 2. remove all OSD ( delete them )
>> 3. create 4 new OSD (using same disk but different ID) but this time tag
>> them with class <pool2>
>> 4. Try to create pool <pool2> -> this will fail with
>>
>> the pool creation will fail with output : Error ERANGE: pg_num 256 size
>> 2 would mean 1024 total pgs, which exceeds max 800 (mon_max_pg_per_osd 200
>> * num_in_osds 4)"
>>
>> Pool1 simply started allocating PG to OSD that doesn't belong to the
>> ruleset
>>
>
> Are you sure pool 1's PGs are actually being placed on the wrong OSDs?
> Have you looked at the output of "ceph pg dump" to check that?
>
> It sounds more like the pool creation check is simply failing to consider
> the crush rules and applying a cruder global check.
>
> John
>
>
>>
>> Which leads me to the following question: is there a way to make the
>> crush rule a hard requirement. E.g : if we do not have any osd matching the
>> device class , it won't start trying to allocate pg to OSD that doesn't
>> match it?
>>
>> Is there any way to prevent pool 1 to use the OSD ?
>>
>>
>>
>>
>> --
>> Dr. Benoit Hudzia
>>
>> Mobile (UK): +44 (0) 75 346 78673
>> Mobile (IE): +353 (0) 89 219 3675
>> Email: [email protected]
>>
>>
>>
>> Web <http://www.stratoscale.com/> | Blog
>> <http://www.stratoscale.com/blog/> | Twitter
>> <https://twitter.com/Stratoscale> | Google+
>> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
>> | Linkedin <https://www.linkedin.com/company/stratoscale>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
--
Dr. Benoit Hudzia
Mobile (UK): +44 (0) 75 346 78673
Mobile (IE): +353 (0) 89 219 3675
Email: [email protected]
Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/>
| Twitter <https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
| Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com