Re: [ceph-users] Preventing pool from allocating PG to OSD belonging not beloning to the device class defined in crush rule

Benoit Hudzia Thu, 26 Jul 2018 12:53:14 -0700

Sorry missing the pg dump :

2.1           0                  0        0         0       0     0   0
    0 stale+peering 2018-07-26 19:38:13.381673     0'0    125:9 [3]
  3    [3]              3        0'0 2018-07-26 15:20:08.965357
 0'0 2018-07-26 15:20:08.965357             0
2.0           0                  0        0         0       0     0   0
    0 stale+peering 2018-07-26 19:38:13.345341     0'0   125:13 [3]
  3    [3]              3        0'0 2018-07-26 15:20:08.965357
 0'0 2018-07-26 15:20:08.965357             0


2 0 0 0 0 0 0 0 0

sum 0 0 0 0 0 0 0 0
OSD_STAT USED  AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
3        1051M 1861G 1863G  [0,1,2]    256            256
2        1051M 1861G 1863G  [0,1,3]      0              0
1        1051M 3724G 3726G  [0,2,3]      0              0
0        1051M 1861G 1863G  [1,2,3]      0              0
sum      4205M 9310G 9315G

For some reason it seems that some PG are allocated to osd 3 ( but stall +
peering)

This is kind of odd

On Thu, 26 Jul 2018 at 20:50, Benoit Hudzia <[email protected]> wrote:

> You are correct the PG are stale ( not allocated )
>
> [root@stratonode1 /]# ceph status
>   cluster:
>     id:     ea0df043-7b25-4447-a43d-e9b2af8fe069
>     health: HEALTH_WARN
>             Reduced data availability: 256 pgs inactive, 256 pgs peering,
> 256 pgs stale
>
>   services:
>     mon: 3 daemons, quorum
> stratonode1.node.strato,stratonode2.node.strato,stratonode0.node.strato
>     mgr: stratonode1(active), standbys: stratonode2, stratonode3
>     osd: 4 osds: 4 up, 4 in
>
>   data:
>     pools:   1 pools, 256 pgs
>     objects: 0 objects, 0 bytes
>     usage:   4192 MB used, 9310 GB / 9315 GB avail
>     pgs:     100.000% pgs not active
>              256 stale+peering
>
> PG dump : show all PG in stale + peering
>
> However it s kind of strange it show some PG associated with OSD 3
>
>
> So it seems that PGcalc is not taking into account the ruleset .....
>
> Do you think that changing ""osd max pg per osd hard ratio""  to a huge
> number (1M) would be a valid temp workaround  ?
>
> We always allocate pool with dedicated OSD using the device class rule set
> , so we never have pool sharing OSD .
>
> I ll open a bug with ceph regarding pg creation check ignoring the crush
> ruleset.
>
>
> On Thu, 26 Jul 2018 at 17:11, John Spray <[email protected]> wrote:
>
>> On Thu, Jul 26, 2018 at 4:57 PM Benoit Hudzia <[email protected]>
>> wrote:
>>
>>> HI,
>>>
>>> We currently segregate ceph pool PG allocation using the crush device
>>> class ruleset as described:
>>> https://ceph.com/community/new-luminous-crush-device-classes/
>>> simply using the following command to define the rule :  ceph osd crush
>>> rule create-replicated <RULE> default host <DEVICE CLASS>
>>>
>>> However, we noticed that the rule is not strict in certain scenarios. By
>>> that, I mean that if there is no OSD of the specific device class ceph will
>>> allocate PG for this pool to any other OSD available ( creating an
>>> issue with the PG calculation when we want to add new pool)
>>>
>>> Simple scenario :
>>> 1. create 1 Pool : <pool1> , replication 2 with 4 nodes , 1 OSD each
>>> . belonging to class <pool1>
>>> 2. remove all OSD ( delete them )
>>> 3. create  4 new OSD (using same disk but different ID) but this time
>>> tag them with class <pool2>
>>> 4. Try to create pool <pool2> -> this will fail with
>>>
>>> the pool creation will fail with  output : Error ERANGE:  pg_num 256
>>> size 2 would mean 1024 total pgs, which exceeds  max 800
>>> (mon_max_pg_per_osd 200 * num_in_osds 4)"
>>>
>>> Pool1 simply started allocating PG to OSD that doesn't belong to the
>>> ruleset
>>>
>>
>> Are you sure pool 1's PGs are actually being placed on the wrong OSDs?
>> Have you looked at the output of "ceph pg dump" to check that?
>>
>> It sounds more like the pool creation check is simply failing to consider
>> the crush rules and applying a cruder global check.
>>
>> John
>>
>>
>>>
>>> Which leads me to the following question:  is there a way to make the
>>> crush rule a hard requirement. E.g : if we do not have any osd matching the
>>> device class , it won't start trying to allocate pg to OSD that doesn't
>>> match it?
>>>
>>> Is there any way to prevent pool 1 to use the OSD ?
>>>
>>>
>>>
>>>
>>> --
>>> Dr. Benoit Hudzia
>>>
>>> Mobile (UK): +44 (0) 75 346 78673
>>> Mobile (IE):  +353 (0) 89 219 3675
>>> Email: [email protected]
>>>
>>>
>>>
>>> Web <http://www.stratoscale.com/> | Blog
>>> <http://www.stratoscale.com/blog/> | Twitter
>>> <https://twitter.com/Stratoscale> | Google+
>>> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
>>>  | Linkedin <https://www.linkedin.com/company/stratoscale>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
> --
> Dr. Benoit Hudzia
>
> Mobile (UK): +44 (0) 75 346 78673
> Mobile (IE):  +353 (0) 89 219 3675
> Email: [email protected]
>
>
>
> Web <http://www.stratoscale.com/> | Blog
> <http://www.stratoscale.com/blog/> | Twitter
> <https://twitter.com/Stratoscale> | Google+
> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
>  | Linkedin <https://www.linkedin.com/company/stratoscale>
>
>

-- 
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675
Email: [email protected]



Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/>
 | Twitter <https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
 | Linkedin <https://www.linkedin.com/company/stratoscale>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Preventing pool from allocating PG to OSD belonging not beloning to the device class defined in crush rule

Reply via email to