Re: [ceph-users] Preventing pool from allocating PG to OSD belonging not beloning to the device class defined in crush rule

Benoit Hudzia Thu, 26 Jul 2018 12:51:55 -0700

You are correct the PG are stale ( not allocated )

[root@stratonode1 /]# ceph status
  cluster:
    id:     ea0df043-7b25-4447-a43d-e9b2af8fe069
    health: HEALTH_WARN
            Reduced data availability: 256 pgs inactive, 256 pgs peering,
256 pgs stale


  services:
    mon: 3 daemons, quorum
stratonode1.node.strato,stratonode2.node.strato,stratonode0.node.strato
    mgr: stratonode1(active), standbys: stratonode2, stratonode3
    osd: 4 osds: 4 up, 4 in

  data:
    pools:   1 pools, 256 pgs
    objects: 0 objects, 0 bytes
    usage:   4192 MB used, 9310 GB / 9315 GB avail
    pgs:     100.000% pgs not active
             256 stale+peering

PG dump : show all PG in stale + peering

However it s kind of strange it show some PG associated with OSD 3


So it seems that PGcalc is not taking into account the ruleset .....

Do you think that changing ""osd max pg per osd hard ratio""  to a huge
number (1M) would be a valid temp workaround  ?

We always allocate pool with dedicated OSD using the device class rule set
, so we never have pool sharing OSD .

I ll open a bug with ceph regarding pg creation check ignoring the crush
ruleset.


On Thu, 26 Jul 2018 at 17:11, John Spray <[email protected]> wrote:

> On Thu, Jul 26, 2018 at 4:57 PM Benoit Hudzia <[email protected]>
> wrote:
>
>> HI,
>>
>> We currently segregate ceph pool PG allocation using the crush device
>> class ruleset as described:
>> https://ceph.com/community/new-luminous-crush-device-classes/
>> simply using the following command to define the rule :  ceph osd crush
>> rule create-replicated <RULE> default host <DEVICE CLASS>
>>
>> However, we noticed that the rule is not strict in certain scenarios. By
>> that, I mean that if there is no OSD of the specific device class ceph will
>> allocate PG for this pool to any other OSD available ( creating an
>> issue with the PG calculation when we want to add new pool)
>>
>> Simple scenario :
>> 1. create 1 Pool : <pool1> , replication 2 with 4 nodes , 1 OSD each
>> . belonging to class <pool1>
>> 2. remove all OSD ( delete them )
>> 3. create  4 new OSD (using same disk but different ID) but this time tag
>> them with class <pool2>
>> 4. Try to create pool <pool2> -> this will fail with
>>
>> the pool creation will fail with  output : Error ERANGE:  pg_num 256 size
>> 2 would mean 1024 total pgs, which exceeds  max 800 (mon_max_pg_per_osd 200
>> * num_in_osds 4)"
>>
>> Pool1 simply started allocating PG to OSD that doesn't belong to the
>> ruleset
>>
>
> Are you sure pool 1's PGs are actually being placed on the wrong OSDs?
> Have you looked at the output of "ceph pg dump" to check that?
>
> It sounds more like the pool creation check is simply failing to consider
> the crush rules and applying a cruder global check.
>
> John
>
>
>>
>> Which leads me to the following question:  is there a way to make the
>> crush rule a hard requirement. E.g : if we do not have any osd matching the
>> device class , it won't start trying to allocate pg to OSD that doesn't
>> match it?
>>
>> Is there any way to prevent pool 1 to use the OSD ?
>>
>>
>>
>>
>> --
>> Dr. Benoit Hudzia
>>
>> Mobile (UK): +44 (0) 75 346 78673
>> Mobile (IE):  +353 (0) 89 219 3675
>> Email: [email protected]
>>
>>
>>
>> Web <http://www.stratoscale.com/> | Blog
>> <http://www.stratoscale.com/blog/> | Twitter
>> <https://twitter.com/Stratoscale> | Google+
>> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
>>  | Linkedin <https://www.linkedin.com/company/stratoscale>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

-- 
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675
Email: [email protected]



Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/>
 | Twitter <https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
 | Linkedin <https://www.linkedin.com/company/stratoscale>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Preventing pool from allocating PG to OSD belonging not beloning to the device class defined in crush rule

Reply via email to