PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
default, will place 200 PGs on each OSD.
I read about the protection in the docs and later noticed that I better had
only placed 100 PGs.


2018-05-17 13:35 GMT+02:00 Kevin Olbrich <k...@sv01.de>:

> Hi!
>
> Thanks for your quick reply.
> Before I read your mail, i applied the following conf to my OSDs:
> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>
> Status is now:
>   data:
>     pools:   2 pools, 1536 pgs
>     objects: 639k objects, 2554 GB
>     usage:   5211 GB used, 11295 GB / 16506 GB avail
>     pgs:     7.943% pgs not active
>              5567/1309948 objects degraded (0.425%)
>              252327/1309948 objects misplaced (19.262%)
>              1030 active+clean
>              351  active+remapped+backfill_wait
>              107  activating+remapped
>              33   active+remapped+backfilling
>              15   activating+undersized+degraded+remapped
>
> A little bit better but still some non-active PGs.
> I will investigate your other hints!
>
> Thanks
> Kevin
>
> 2018-05-17 13:30 GMT+02:00 Burkhard Linke <Burkhard.Linke@computational.
> bio.uni-giessen.de>:
>
>> Hi,
>>
>>
>>
>> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>>
>>> Hi!
>>>
>>> Today I added some new OSDs (nearly doubled) to my luminous cluster.
>>> I then changed pg(p)_num from 256 to 1024 for that pool because it was
>>> complaining about to few PGs. (I noticed that should better have been
>>> small
>>> changes).
>>>
>>> This is the current status:
>>>
>>>      health: HEALTH_ERR
>>>              336568/1307562 objects misplaced (25.740%)
>>>              Reduced data availability: 128 pgs inactive, 3 pgs peering,
>>> 1
>>> pg stale
>>>              Degraded data redundancy: 6985/1307562 objects degraded
>>> (0.534%), 19 pgs degraded, 19 pgs undersized
>>>              107 slow requests are blocked > 32 sec
>>>              218 stuck requests are blocked > 4096 sec
>>>
>>>    data:
>>>      pools:   2 pools, 1536 pgs
>>>      objects: 638k objects, 2549 GB
>>>      usage:   5210 GB used, 11295 GB / 16506 GB avail
>>>      pgs:     0.195% pgs unknown
>>>               8.138% pgs not active
>>>               6985/1307562 objects degraded (0.534%)
>>>               336568/1307562 objects misplaced (25.740%)
>>>               855 active+clean
>>>               517 active+remapped+backfill_wait
>>>               107 activating+remapped
>>>               31  active+remapped+backfilling
>>>               15  activating+undersized+degraded+remapped
>>>               4   active+undersized+degraded+remapped+backfilling
>>>               3   unknown
>>>               3   peering
>>>               1   stale+active+clean
>>>
>>
>> You need to resolve the unknown/peering/activating pgs first. You have
>> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
>> OSDs and the heterogenous host sizes, I assume that some OSDs hold more
>> than 200 PGs. There's a threshold for the number of PGs; reaching this
>> threshold keeps the OSDs from accepting new PGs.
>>
>> Try to increase the threshold  (mon_max_pg_per_osd /
>> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about
>> the exact one, consult the documentation) to allow more PGs on the OSDs. If
>> this is the cause of the problem, the peering and activating states should
>> be resolved within a short time.
>>
>> You can also check the number of PGs per OSD with 'ceph osd df'; the last
>> column is the current number of PGs.
>>
>>
>>>
>>> OSD tree:
>>>
>>> ID  CLASS WEIGHT   TYPE NAME                     STATUS REWEIGHT PRI-AFF
>>>   -1       16.12177 root default
>>> -16       16.12177     datacenter dc01
>>> -19       16.12177         pod dc01-agg01
>>> -10        8.98700             rack dc01-rack02
>>>   -4        4.03899                 host node1001
>>>    0   hdd  0.90999                     osd.0         up  1.00000 1.00000
>>>    1   hdd  0.90999                     osd.1         up  1.00000 1.00000
>>>    5   hdd  0.90999                     osd.5         up  1.00000 1.00000
>>>    2   ssd  0.43700                     osd.2         up  1.00000 1.00000
>>>    3   ssd  0.43700                     osd.3         up  1.00000 1.00000
>>>    4   ssd  0.43700                     osd.4         up  1.00000 1.00000
>>>   -7        4.94899                 host node1002
>>>    9   hdd  0.90999                     osd.9         up  1.00000 1.00000
>>>   10   hdd  0.90999                     osd.10        up  1.00000 1.00000
>>>   11   hdd  0.90999                     osd.11        up  1.00000 1.00000
>>>   12   hdd  0.90999                     osd.12        up  1.00000 1.00000
>>>    6   ssd  0.43700                     osd.6         up  1.00000 1.00000
>>>    7   ssd  0.43700                     osd.7         up  1.00000 1.00000
>>>    8   ssd  0.43700                     osd.8         up  1.00000 1.00000
>>> -11        7.13477             rack dc01-rack03
>>> -22        5.38678                 host node1003
>>>   17   hdd  0.90970                     osd.17        up  1.00000 1.00000
>>>   18   hdd  0.90970                     osd.18        up  1.00000 1.00000
>>>   24   hdd  0.90970                     osd.24        up  1.00000 1.00000
>>>   26   hdd  0.90970                     osd.26        up  1.00000 1.00000
>>>   13   ssd  0.43700                     osd.13        up  1.00000 1.00000
>>>   14   ssd  0.43700                     osd.14        up  1.00000 1.00000
>>>   15   ssd  0.43700                     osd.15        up  1.00000 1.00000
>>>   16   ssd  0.43700                     osd.16        up  1.00000 1.00000
>>> -25        1.74799                 host node1004
>>>   19   ssd  0.43700                     osd.19        up  1.00000 1.00000
>>>   20   ssd  0.43700                     osd.20        up  1.00000 1.00000
>>>   21   ssd  0.43700                     osd.21        up  1.00000 1.00000
>>>   22   ssd  0.43700                     osd.22        up  1.00000 1.00000
>>>
>>>
>>> Crush rule is set to chooseleaf rack and (temporary!) to size 2.
>>> Why are PGs stuck in peering and activating?
>>> "ceph df" shows that only 1,5TB are used on the pool, residing on the
>>> hdd's
>>> - which would perfectly fit the crush rule....(?)
>>>
>>
>> Size 2 within the crush rule or size 2 for the two pools?
>>
>> Regards,
>> Burkhard
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to