Perhaps WAL is filling up when iodepth is so high? Is WAL on the same
SSDs? If you double the WAL size, does it change?


On Mon, Dec 14, 2020 at 9:05 PM Jason Dillaman <jdill...@redhat.com> wrote:
>
> On Mon, Dec 14, 2020 at 1:28 PM Philip Brown <pbr...@medata.com> wrote:
> >
> > Our goal is to put up a high performance ceph cluster that can deal with 
> > 100 very active clients. So for us, testing with iodepth=256 is actually 
> > fairly realistic.
>
> 100 active clients on the same node or just 100 active clients?
>
> > but it does also exhibit the problem with iodepth=32
> >
> > [root@irviscsi03 ~]# fio --filename=/dev/rbd0 --direct=1 --rw=randwrite 
> > --bs=4k --ioengine=libaio --iodepth=32 --numjobs=1 --time_based 
> > --group_reporting --name=iops-test-job --runtime=120 --eta-newline=1
> > iops-test-job: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
> > (T) 4096B-4096B, ioengine=libaio, iodepth=32
> > fio-3.7
> > Starting 1 process
> > fio: file /dev/rbd0 exceeds 32-bit tausworthe random generator.
> > fio: Switching to tausworthe64. Use the random_generator= option to get rid 
> > of this warning.
> > Jobs: 1 (f=1): [w(1)][2.5%][r=0KiB/s,w=20.5MiB/s][r=0,w=5258 IOPS][eta 
> > 01m:58s]
> > Jobs: 1 (f=1): [w(1)][4.1%][r=0KiB/s,w=41.1MiB/s][r=0,w=10.5k IOPS][eta 
> > 01m:56s]
> > Jobs: 1 (f=1): [w(1)][5.8%][r=0KiB/s,w=45.7MiB/s][r=0,w=11.7k IOPS][eta 
> > 01m:54s]
> > Jobs: 1 (f=1): [w(1)][7.4%][r=0KiB/s,w=55.3MiB/s][r=0,w=14.2k IOPS][eta 
> > 01m:52s]
> > Jobs: 1 (f=1): [w(1)][9.1%][r=0KiB/s,w=54.4MiB/s][r=0,w=13.9k IOPS][eta 
> > 01m:50s]
> > Jobs: 1 (f=1): [w(1)][10.7%][r=0KiB/s,w=53.4MiB/s][r=0,w=13.7k IOPS][eta 
> > 01m:48s]
> > Jobs: 1 (f=1): [w(1)][12.4%][r=0KiB/s,w=53.7MiB/s][r=0,w=13.7k IOPS][eta 
> > 01m:46s]
> > Jobs: 1 (f=1): [w(1)][14.0%][r=0KiB/s,w=55.7MiB/s][r=0,w=14.3k IOPS][eta 
> > 01m:44s]
> > Jobs: 1 (f=1): [w(1)][15.7%][r=0KiB/s,w=54.4MiB/s][r=0,w=13.9k IOPS][eta 
> > 01m:42s]
> > Jobs: 1 (f=1): [w(1)][17.4%][r=0KiB/s,w=51.6MiB/s][r=0,w=13.2k IOPS][eta 
> > 01m:40s]
> > Jobs: 1 (f=1): [w(1)][19.0%][r=0KiB/s,w=38.1MiB/s][r=0,w=9748 IOPS][eta 
> > 01m:38s]
> > Jobs: 1 (f=1): [w(1)][20.7%][r=0KiB/s,w=24.1MiB/s][r=0,w=6158 IOPS][eta 
> > 01m:36s]
> > Jobs: 1 (f=1): [w(1)][22.3%][r=0KiB/s,w=12.4MiB/s][r=0,w=3178 IOPS][eta 
> > 01m:34s]
> > Jobs: 1 (f=1): [w(1)][24.0%][r=0KiB/s,w=31.5MiB/s][r=0,w=8056 IOPS][eta 
> > 01m:32s]
> > Jobs: 1 (f=1): [w(1)][25.6%][r=0KiB/s,w=48.6MiB/s][r=0,w=12.4k IOPS][eta 
> > 01m:30s]
> > Jobs: 1 (f=1): [w(1)][27.3%][r=0KiB/s,w=52.2MiB/s][r=0,w=13.4k IOPS][eta 
> > 01m:28s]
> > Jobs: 1 (f=1): [w(1)][28.9%][r=0KiB/s,w=54.3MiB/s][r=0,w=13.9k IOPS][eta 
> > 01m:26s]
> > Jobs: 1 (f=1): [w(1)][30.6%][r=0KiB/s,w=52.6MiB/s][r=0,w=13.5k IOPS][eta 
> > 01m:24s]
> > Jobs: 1 (f=1): [w(1)][32.2%][r=0KiB/s,w=55.1MiB/s][r=0,w=14.1k IOPS][eta 
> > 01m:22s]
> > Jobs: 1 (f=1): [w(1)][33.9%][r=0KiB/s,w=34.3MiB/s][r=0,w=8775 IOPS][eta 
> > 01m:20s]
> > Jobs: 1 (f=1): [w(1)][35.5%][r=0KiB/s,w=52.5MiB/s][r=0,w=13.4k IOPS][eta 
> > 01m:18s]
> > Jobs: 1 (f=1): [w(1)][37.2%][r=0KiB/s,w=52.7MiB/s][r=0,w=13.5k IOPS][eta 
> > 01m:16s]
> > Jobs: 1 (f=1): [w(1)][38.8%][r=0KiB/s,w=53.9MiB/s][r=0,w=13.8k IOPS][eta 
> > 01m:14s]
>
> Have you tried different kernel versions? Might also be worthwhile
> testing using fio's "rados" engine [1] (vs your rados bench test)
> since it might not have been comparing apples-to-apples given the
> >400MiB/s throughout you listed (i.e. large IOs are handled
> differently than small IOs internally).
>
> >   .. etc.
> >
> >
> > ----- Original Message -----
> > From: "Jason Dillaman" <jdill...@redhat.com>
> > To: "Philip Brown" <pbr...@medata.com>
> > Cc: "ceph-users" <ceph-users@ceph.io>
> > Sent: Monday, December 14, 2020 10:19:48 AM
> > Subject: Re: [ceph-users] performance degredation every 30 seconds
> >
> > On Mon, Dec 14, 2020 at 12:46 PM Philip Brown <pbr...@medata.com> wrote:
> > >
> > > Further experimentation with fio's -rw flag, setting to rw=read, and 
> > > rw=randwrite, in addition to the original rw=randrw, indicates that it is 
> > > tied to writes.
> > >
> > > Possibly some kind of buffer flush delay or cache sync delay when using 
> > > rbd device, even though fio specified --direct=1   ?
> >
> > It might be worthwhile testing with a more realistic io-depth instead
> > of 256 in case you are hitting weird limits due to an untested corner
> > case? Does the performance still degrade with "--iodepth=16" or
> > "--iodepth=32"?
> >
>
> [1] https://github.com/axboe/fio/blob/master/examples/rados.fio
>
> --
> Jason
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to