Wow Josh, thanks a lot for prompt help!

Indeed, I thought mon_max_pg_per_osd (which was 500 in my case) would work in combination with the multiplier max_pg_per_osd_hard_ratio which if I am not mistaken is 2 by default: I had ~700 PGs/OSD so I was feeling rather safe.

However, I temporarily doubled the max_pg_per_osd value and "repeer"ed: an ansible round of "systemctl restart ceph-osd.target" on all OSDs also helped clear "slow ops".

  Thanks a lot, again!

                        Fulvio


On 15/09/2022 17:47, Josh Baergen wrote:
Hi Fulvio,

I've seen this in the past when a CRUSH change temporarily resulted in
too many PGs being mapped to an OSD, exceeding mon_max_pg_per_osd. You
can try increasing that setting to see if it helps, then setting it
back to default once backfill completes. You may also need to "ceph pg
repeer $pgid" for each of the PGs stuck activating.

Josh

On Thu, Sep 15, 2022 at 8:42 AM Fulvio Galeazzi <fulvio.galea...@garr.it> wrote:


Hallo,
         I am on Nautilus and today, after upgrading the operating system (from
CentOS 7 to CentOS 8 Stream) on a couple OSD servers and adding them
back to the cluster, I noticed some PGs are still "activating".
     The upgraded server are from the same "rack", and I have replica-3
pools with 1-per-rack rule, and 6+4 EC pools (in some cases, with SSD
pool for metadata).

More details:
- on the two OSD servers I upgrade, I ran "systemctl stop ceph.target"
     and waited a while, to verify all PGs would remain "active"
- went on with the upgrade and ceph-ansible reconfig
- as soon as I started adding OSDs I saw "slow ops"
- to exclude possible effect of updated packages, I ran "yum update" on
     all OSD servers, and rebooted them one by one
- after 2-3 hours, the last OSD disks finally came up
- I am left with:
         about 1k "slow ops" (if I pause recovery, number ~stable but max
                 age increasing)
         ~200 inactive PGs

     Most of the inactive PGs are from the object store pool:

[cephmgr@cephAdmCT1.cephAdmCT1 ~]$ ceph osd pool get
default.rgw.buckets.data crush_rule
crush_rule: default.rgw.buckets.data

rule default.rgw.buckets.data {
           id 6
           type erasure
           min_size 3
           max_size 10
           step set_chooseleaf_tries 5
           step set_choose_tries 100
           step take default class big
           step chooseleaf indep 0 type host
           step emit
}

     But "ceph pg dump_stuck inactive" also shows 4 lines for the glance
replicated pool, like:

82.34                       activating+remapped  [139,50,207]  139
[139,50,284]  139
82.54   activating+undersized+degraded+remapped    [139,86,5]  139
[139,74]      139


Need your help please:

- any idea what was the root cause for all this?

- and now, how can I help OSDs complete their activation?
     + does the procedure differ for EC or replicated pools, by the way?
     + or may be I should first get rid of the "slow ops" issue?

I am pasting:
ceph osd df tree
     https://pastebin.ubuntu.com/p/VWhT7FWf6m/

ceph osd lspools ; ceph pg dump_stuck inactive
     https://pastebin.ubuntu.com/p/9f6rXRYMh4/

     Thanks a lot!

                         Fulvio

--
Fulvio Galeazzi
GARR-CSD Department
tel.: +39-334-6533-250
skype: fgaleazzi70
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to