On Tue, Feb 2, 2021 at 2:15 PM Wang Yugui <wangyu...@e16-tech.com> wrote:
>
> Hi, Filipe Manana
>
> There are some dbench(sync mode) result on the same hardware,
> but with different linux kernel
>
> 4.14.200
> Operation      Count    AvgLat    MaxLat
>  ----------------------------------------
>  WriteX        225281     5.163    82.143
>  Flush          32161     2.250    62.669
> Throughput 236.719 MB/sec (sync open)  32 clients  32 procs  
> max_latency=82.149 ms
>
> 4.19.21
> Operation      Count    AvgLat    MaxLat
>  ----------------------------------------
>  WriteX        118842    10.946   116.345
>  Flush          16506     0.115    44.575
> Throughput 125.973 MB/sec (sync open)  32 clients  32 procs  
> max_latency=116.390 ms
>
> 4.19.150
>  Operation      Count    AvgLat    MaxLat
>  ----------------------------------------
>  WriteX        144509     9.151   117.353
>  lush          20563     0.128    52.014
> Throughput 153.707 MB/sec (sync open)  32 clients  32 procs  
> max_latency=117.379 ms
>
> 5.4.91
>  Operation      Count    AvgLat    MaxLat
>  ----------------------------------------
>  WriteX        367033     4.377  1908.724
>  Flush          52037     0.159    39.871
> Throughput 384.554 MB/sec (sync open)  32 clients  32 procs  
> max_latency=1908.968 ms

Ok, it seems somewhere between 4.19 and 5.4, something made the
latency much worse for you at least.

Is it only when using sync open (O_SYNC, dbench's -s flag), what about
when not using it?

I'll have to look at it, but it will likely take some time.

Thanks.

>
> 5.10.12+patches
> Operation      Count    AvgLat    MaxLat
>  ----------------------------------------
>  WriteX        429696     3.960  2239.973
>  Flush          60771     0.621     6.794
> Throughput 452.385 MB/sec (sync open)  32 clients  32 procs  
> max_latency=1963.312 ms
>
>
> MaxLat / AvgLat  of WriteX is increased from 82.143/5.163=15.9  to
> 2239.973/3.960=565.6.
>
> For QoS, can we have an option to tune the value of MaxLat / AvgLat  of
> WriteX to less than 100?
>
> Best Regards
> Wang Yugui (wangyu...@e16-tech.com)
> 2021/02/02
>
> > Hi, Filipe Manana
> >
> > > On Tue, Feb 2, 2021 at 5:42 AM Wang Yugui <wangyu...@e16-tech.com> wrote:
> > > >
> > > > Hi, Filipe Manana
> > > >
> > > > The dbench result with these patches is very good. thanks a lot.
> > > >
> > > > This is the dbench(synchronous mode) result , and then a question.
> > > >
> > > > command: dbench -s -t 60 -D /btrfs/ 32
> > > > mount option:ssd,space_cache=v2
> > > > kernel:5.10.12 + patchset 1 + this patchset
> > >
> > > patchset 1 and "this patchset" are the same, did you mean two
> > > different patchsets or just a single patchset?
> >
> > patchset1:
> > btrfs: some performance improvements for dbench alike workloads
> >
> > patchset2:
> > btrfs: more performance improvements for dbench workloads
> > https://patchwork.kernel.org/project/linux-btrfs/list/?series=422801
> >
> > I'm sorroy that I have replayed to the wrong patchset.
> >
> > >
> > > >
> > > > Question:
> > > > for synchronous mode, the result type 1 is perfect?
> > >
> > > What do you mean by perfect? You mean if result 1 is better than result 2?
> >
> > In result 1,  the MaxLat of Flush of dbench synchronous mode is fast as
> > expected, the same level as  kernel 5.4.91.
> >
> > But in result 2, the MaxLat of Flush of dbench synchronous mode is big
> > as write level, but this is synchronous mode, most job should be done
> > already before flush.
> >
> > > > and there is still some minor place about the flush to do for
> > > > the result type2?
> > >
> > > By "minor place" you mean the huge difference I suppose.
> > >
> > > >
> > > >
> > > > result type 1:
> > > >
> > > >  Operation      Count    AvgLat    MaxLat
> > > >  ----------------------------------------
> > > >  NTCreateX     868942     0.028     3.017
> > > >  Close         638536     0.003     0.061
> > > >  Rename         36851     0.663     4.000
> > > >  Unlink        175182     0.399     5.358
> > > >  Qpathinfo     789014     0.014     1.846
> > > >  Qfileinfo     137684     0.002     0.047
> > > >  Qfsinfo       144241     0.004     0.059
> > > >  Sfileinfo      70913     0.008     0.046
> > > >  Find          304554     0.057     1.889
> > > > ** WriteX        429696     3.960  2239.973
> > > >  ReadX        1363356     0.005     0.358
> > > >  LockX           2836     0.004     0.038
> > > >  UnlockX         2836     0.002     0.018
> > > > ** Flush          60771     0.621     6.794
> > > >
> > > > Throughput 452.385 MB/sec (sync open)  32 clients  32 procs  
> > > > max_latency=1963.312 ms
> > > > + stat -f -c %T /btrfs/
> > > > btrfs
> > > > + uname -r
> > > > 5.10.12-4.el7.x86_64
> > > >
> > > >
> > > > result type 2:
> > > >  Operation      Count    AvgLat    MaxLat
> > > >  ----------------------------------------
> > > >  NTCreateX     888943     0.028     2.679
> > > >  Close         652765     0.002     0.058
> > > >  Rename         37705     0.572     3.962
> > > >  Unlink        179713     0.383     3.983
> > > >  Qpathinfo     806705     0.014     2.294
> > > >  Qfileinfo     140752     0.002     0.125
> > > >  Qfsinfo       147909     0.004     0.049
> > > >  Sfileinfo      72374     0.008     0.104
> > > >  Find          311839     0.058     2.305
> > > > ** WriteX        439656     3.854  1872.109
> > > >  ReadX        1396868     0.005     0.324
> > > >  LockX           2910     0.004     0.026
> > > >  UnlockX         2910     0.002     0.025
> > > > ** Flush          62260     0.750  1659.364
> > > >
> > > > Throughput 461.856 MB/sec (sync open)  32 clients  32 procs  
> > > > max_latency=1872.118 ms
> > > > + stat -f -c %T /btrfs/
> > > > btrfs
> > > > + uname -r
> > > > 5.10.12-4.el7.x86_64
> > >
> > > I'm not sure what your question is exactly.
> > >
> > > Are both results after applying the same patchset, or are they before
> > > and after applying the patchset, respectively?
> >
> > Both result after applying the same patchset.
> > and both on the same server, same SAS SSD disk.
> > but the result is not stable, and the major diff is MaxLat of Flush.
> >
> > Server:Dell T7610
> > CPU: E5-2660 v2(10core 20threads) x2
> > SSD:TOSHIBA  PX05SMQ040
> > Memory:192G (with ECC)
> >
> >
> > > If they are both with the patchset applied, and you wonder about the
> > > big variation in the "Flush" operations, I am not sure about why it is
> > > so.
> > > Both throughput and max latency are better in result 2.
> > >
> > > It's normal to have variations across dbench runs, I get them too, and
> > > I do several runs (5 or 6) to check things out.
> > >
> > > I don't use virtualization (testing on bare metal), I set the cpu
> > > governor mode to performance (instead of the "powersave" default) and
> > > use a non-debug kernel configuration, because otherwise I get
> > > significant variations in latencies and throughput too (though I never
> > > got a huge difference such as from 6.794 to 1659.364).
> >
> > This is a bare metal(dell T7610).
> > CPU mode is set to performance by BIOS. and I checked it by
> > 'cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
> >
> > Maybe I used a SAS ssd, and the queue depth of SAS SSD is 254.
> > smaller than 1023 of a NVMe SSD,but it is still enough for
> > dbench 32 threads?
> >
> >
> > The huge difference of MaxLat of Flush such as from 6.794 to 1659.364 is
> > a problem.
> > It is not easy to re-product both,  mabye easy to reproduce the small
> > one, maybe easy to reproduce the big one.
> >
> >
> > Best Regards
> > Wang Yugui (wangyu...@e16-tech.com)
> > 2021/02/02
> >
>
>

Reply via email to