[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

Frank Schilder Sat, 15 Oct 2022 12:44:35 -0700

Hi all,

I can now add another data point as well. We upgraded our production cluster 
from mimic to octopus with the procedure of

- set quick-fix-on-start=false in all ceph.conf files and the mon config store
- set nosnaptrim
- upgrade all daemons
- set require-osd-release=octopus
- host by host: set quick-fix-on-start=true in ceph.conf and restart OSDs
- unset nosnaptrim

On our production system the conversion went much faster compared with the test 
system. This process is very CPU intensive, yet converting 70 OSDs per host 
with 2x18 core Broadwell CPUs worked without problems. Load reached more than 
200% but it all finished without crashes.

Upgrading the daemons and completing the conversion of all hosts took 3 very 
long days. After conversion in this way no problems with snaptrim. We also 
enabled ephemeral pinning on our FS with 8 active MDSes and see no change in 
single-user performance, but at least 2-3 times higher aggregated throughput 
(home for a 500 node HPC cluster).

We did have a severe hiccup though. Very small OSDs with a size of ca. 100G 
crash on octopus when OMAP reaches a certain size. I don't know yet what a safe 
minimum size is (ongoing thread "OSD crashes during upgrade mimic->octopus"). 
The 300G OSDs on our test cluster worked fine.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Tyler Stachecki <stachecki.ty...@gmail.com>
Sent: 27 September 2022 02:00
To: Marc
Cc: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from 
nautilus to octopus

Just a datapoint - we upgraded several large Mimic-born clusters straight to 
15.2.12 with the quick fsck disabled in ceph.conf, then did 
require-osd-release, and finally did the omap conversion offline after the 
cluster was upgraded using the bluestore tool while the OSDs were down (all 
done in batches). Clusters are zippy as ever.

Maybe on a whim, try doing an offline fsck with the bluestore tool and see if 
it improves things?

To answer an earlier question, if you have no health statuses muted, a 'ceph 
health detail' should show you at least a subset of OSDs that have not gone 
through the omap conversion yet.

Cheers,
Tyler

On Mon, Sep 26, 2022, 5:13 PM Marc 
<m...@f1-outsourcing.eu<mailto:m...@f1-outsourcing.eu>> wrote:
Hi Frank,

Thank you very much for this! :)

>
> we just completed a third upgrade test. There are 2 ways to convert the
> OSDs:
>
> A) convert along with the upgrade (quick-fix-on-start=true)
> B) convert after setting require-osd-release=octopus (quick-fix-on-
> start=false until require-osd-release set to octopus, then restart to
> initiate conversion)
>
> There is a variation A' of A: follow A, then initiate manual compaction
> and restart all OSDs.
>
> Our experiments show that paths A and B do *not* yield the same result.
> Following path A leads to a severely performance degraded cluster. As of
> now, we cannot confirm that A' fixes this. It seems that the only way
> out is to zap and re-deploy all OSDs, basically what Boris is doing
> right now.
>
> We extended now our procedure to adding
>
>   bluestore_fsck_quick_fix_on_mount = false
>
> to every ceph.conf file and executing
>
>   ceph config set osd bluestore_fsck_quick_fix_on_mount false
>
> to catch any accidents. After daemon upgrade, we set
> bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf
> and restart OSDs.
>
> This procedure works like a charm.
>
> I don't know what the difference between A and B is. It is possible that
> B executes an extra step that is missing in A. The performance
> degradation only shows up when snaptrim is active, but then it is very
> severe. I suspect that many users who complained about snaptrim in the
> past have at least 1 A-converted OSD in their cluster.
>
> If you have a cluster upgraded with B-converted OSDs, it works like a
> native octopus cluster. There is very little performance reduction
> compared with mimic. In exchange, I have the impression that it operates
> more stable.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

Reply via email to