Sorry missing the pg dump :
2.1 0 0 0 0 0 0 0
0 stale+peering 2018-07-26 19:38:13.381673 0'0 125:9 [3]
3 [3] 3 0'0 2018-07-26 15:20:08.965357
0'0 2018-07-26 15:20:08.965357 0
2.0 0 0 0 0 0 0 0
0 stale+peering 2018-07-26 19:38:13.345341 0'0 125:13 [3]
3 [3] 3 0'0 2018-07-26 15:20:08.965357
0'0 2018-07-26 15:20:08.965357 0
2 0 0 0 0 0 0 0 0
sum 0 0 0 0 0 0 0 0
OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
3 1051M 1861G 1863G [0,1,2] 256 256
2 1051M 1861G 1863G [0,1,3] 0 0
1 1051M 3724G 3726G [0,2,3] 0 0
0 1051M 1861G 1863G [1,2,3] 0 0
sum 4205M 9310G 9315G
For some reason it seems that some PG are allocated to osd 3 ( but stall +
peering)
This is kind of odd
On Thu, 26 Jul 2018 at 20:50, Benoit Hudzia <[email protected]> wrote:
> You are correct the PG are stale ( not allocated )
>
> [root@stratonode1 /]# ceph status
> cluster:
> id: ea0df043-7b25-4447-a43d-e9b2af8fe069
> health: HEALTH_WARN
> Reduced data availability: 256 pgs inactive, 256 pgs peering,
> 256 pgs stale
>
> services:
> mon: 3 daemons, quorum
> stratonode1.node.strato,stratonode2.node.strato,stratonode0.node.strato
> mgr: stratonode1(active), standbys: stratonode2, stratonode3
> osd: 4 osds: 4 up, 4 in
>
> data:
> pools: 1 pools, 256 pgs
> objects: 0 objects, 0 bytes
> usage: 4192 MB used, 9310 GB / 9315 GB avail
> pgs: 100.000% pgs not active
> 256 stale+peering
>
> PG dump : show all PG in stale + peering
>
> However it s kind of strange it show some PG associated with OSD 3
>
>
> So it seems that PGcalc is not taking into account the ruleset .....
>
> Do you think that changing ""osd max pg per osd hard ratio"" to a huge
> number (1M) would be a valid temp workaround ?
>
> We always allocate pool with dedicated OSD using the device class rule set
> , so we never have pool sharing OSD .
>
> I ll open a bug with ceph regarding pg creation check ignoring the crush
> ruleset.
>
>
> On Thu, 26 Jul 2018 at 17:11, John Spray <[email protected]> wrote:
>
>> On Thu, Jul 26, 2018 at 4:57 PM Benoit Hudzia <[email protected]>
>> wrote:
>>
>>> HI,
>>>
>>> We currently segregate ceph pool PG allocation using the crush device
>>> class ruleset as described:
>>> https://ceph.com/community/new-luminous-crush-device-classes/
>>> simply using the following command to define the rule : ceph osd crush
>>> rule create-replicated <RULE> default host <DEVICE CLASS>
>>>
>>> However, we noticed that the rule is not strict in certain scenarios. By
>>> that, I mean that if there is no OSD of the specific device class ceph will
>>> allocate PG for this pool to any other OSD available ( creating an
>>> issue with the PG calculation when we want to add new pool)
>>>
>>> Simple scenario :
>>> 1. create 1 Pool : <pool1> , replication 2 with 4 nodes , 1 OSD each
>>> . belonging to class <pool1>
>>> 2. remove all OSD ( delete them )
>>> 3. create 4 new OSD (using same disk but different ID) but this time
>>> tag them with class <pool2>
>>> 4. Try to create pool <pool2> -> this will fail with
>>>
>>> the pool creation will fail with output : Error ERANGE: pg_num 256
>>> size 2 would mean 1024 total pgs, which exceeds max 800
>>> (mon_max_pg_per_osd 200 * num_in_osds 4)"
>>>
>>> Pool1 simply started allocating PG to OSD that doesn't belong to the
>>> ruleset
>>>
>>
>> Are you sure pool 1's PGs are actually being placed on the wrong OSDs?
>> Have you looked at the output of "ceph pg dump" to check that?
>>
>> It sounds more like the pool creation check is simply failing to consider
>> the crush rules and applying a cruder global check.
>>
>> John
>>
>>
>>>
>>> Which leads me to the following question: is there a way to make the
>>> crush rule a hard requirement. E.g : if we do not have any osd matching the
>>> device class , it won't start trying to allocate pg to OSD that doesn't
>>> match it?
>>>
>>> Is there any way to prevent pool 1 to use the OSD ?
>>>
>>>
>>>
>>>
>>> --
>>> Dr. Benoit Hudzia
>>>
>>> Mobile (UK): +44 (0) 75 346 78673
>>> Mobile (IE): +353 (0) 89 219 3675
>>> Email: [email protected]
>>>
>>>
>>>
>>> Web <http://www.stratoscale.com/> | Blog
>>> <http://www.stratoscale.com/blog/> | Twitter
>>> <https://twitter.com/Stratoscale> | Google+
>>> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
>>> | Linkedin <https://www.linkedin.com/company/stratoscale>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
> --
> Dr. Benoit Hudzia
>
> Mobile (UK): +44 (0) 75 346 78673
> Mobile (IE): +353 (0) 89 219 3675
> Email: [email protected]
>
>
>
> Web <http://www.stratoscale.com/> | Blog
> <http://www.stratoscale.com/blog/> | Twitter
> <https://twitter.com/Stratoscale> | Google+
> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
> | Linkedin <https://www.linkedin.com/company/stratoscale>
>
>
--
Dr. Benoit Hudzia
Mobile (UK): +44 (0) 75 346 78673
Mobile (IE): +353 (0) 89 219 3675
Email: [email protected]
Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/>
| Twitter <https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
| Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com