Thanks for your reply.
What I meant with high load was load as seen by the top command, all the
servers have load average over 10.
I added one more noode to add more space.
This is what I get from ceph status:
cluster:
id: <redacted>
health: HEALTH_WARN
2 failed cephadm daemon(s)
48 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't
resolve itself): 24 pgs backfill_toofull
4 pool(s) nearfull
services:
mon: 5 daemons, quorum ceph03,ceph02,ceph05,ceph01,ceph04 (age 4h)
mgr: ceph03.xmbwxh(active, since 2d), standbys: ceph01.ecfgwz,
ceph10.rcvwmp
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 61 osds: 61 up (since 4h), 61 in (since 4h); 1264 remapped pgs
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 15 pools, 4465 pgs
objects: 26.53M objects, 91 TiB
usage: 284 TiB used, 75 TiB / 359 TiB avail
pgs: 8613187/79613362 objects misplaced (10.819%)
3201 active+clean
1240 active+remapped+backfilling
22 active+remapped+backfill_toofull
2 active+remapped+backfill_wait+backfill_toofull
io:
client: 624 MiB/s rd, 1.6 KiB/s wr, 263 op/s rd, 17 op/s wr
recovery: 164 MiB/s, 45 objects/s
The performance balances as I expected giving priority to client traffic.
I get a lot of health warnings about osd_slow_ping_time_back,
osd_slow_ping_time_front and slow_ops.
I noticed that there are 1240 pgs backfilling in parallel. Is that as
expected?
/Jimmy
On Wed, Jul 6, 2022 at 3:28 PM Sridhar Seshasayee <[email protected]>
wrote:
> Hi Jimmy,
>
> As you rightly pointed out, the OSD recovery priority does not work
> because of the
> change to mClock. By default, the "high_client_ops" profile is enabled and
> this
> optimizes client ops when compared to recovery ops. Recovery ops will take
> the
> longest time to complete with this profile and this is expected.
>
> When you say "load avg on my servers is high", I am assuming it's the
> recovery load.
> If you want recovery ops to complete faster, then you can first try
> changing the mClock
> profile to the "balanced" profile on all OSDs and see if it improves the
> situation. The
> "high_recovery_ops" profile would be the next option as it will provide
> the best recovery
> performance. But with both the "balanced" and the "high_recovery_ops"
> profiles,
> improved recovery performance will be at the expense of client ops which
> will
> experience slightly higher latencies.
>
> For more details on the mClock profiles, see mClock Config Reference:
> https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/
>
> To switch Profiles, see:
>
> https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#steps-to-enable-mclock-profile
>
> The recommendation would be to change the profile on all OSDs to get the
> best performance for the operation you are interested in.
>
> -Sridhar
>
>
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]