On Wed, 12 Mar 2025, Ming Lei wrote:

> > > It isn't perfect, sometime it may be slower than running on io-wq
> > > directly.
> > > 
> > > But is there any better way for covering everything?
> > 
> > Yes - fix the loop queue workers.
> 
> What you suggested is threaded aio by submitting IO concurrently from
> different task context, this way is not the most efficient one, otherwise
> modern language won't invent async/.await.
> 
> In my test VM, by running Mikulas's fio script on loop/nvme by the attached
> threaded_aio patch:
> 
> NOWAIT with MQ 4              :   70K iops(read), 70K iops(write), cpu util: 
> 40%
> threaded_aio with MQ 4        :       64k iops(read), 64K iops(write), cpu 
> util: 52% 
> in tree loop(SQ)              :   58K iops(read), 58K iops(write)     
> 
> Mikulas, please feel free to run your tests with threaded_aio:
> 
>       modprobe loop nr_hw_queues=4 threaded_aio=1
> 
> by applying the attached the patch over the loop patchset.
> 
> The performance gap could be more obvious in fast hardware.

With "threaded_aio=1":

Sync io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=psync 
--iodepth=1 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
   READ: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=3001MiB 
(3147MB), run=10001-10001msec
  WRITE: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=3004MiB 
(3149MB), run=10001-10001msec

Async io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=libaio 
--iodepth=16 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
   READ: bw=869MiB/s (911MB/s), 869MiB/s-869MiB/s (911MB/s-911MB/s), io=8694MiB 
(9116MB), run=10002-10002msec
  WRITE: bw=870MiB/s (913MB/s), 870MiB/s-870MiB/s (913MB/s-913MB/s), io=8706MiB 
(9129MB), run=10002-10002msec


Without "threaded_aio=1":

Sync io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=psync 
--iodepth=1 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
   READ: bw=348MiB/s (365MB/s), 348MiB/s-348MiB/s (365MB/s-365MB/s), io=3481MiB 
(3650MB), run=10001-10001msec
  WRITE: bw=348MiB/s (365MB/s), 348MiB/s-348MiB/s (365MB/s-365MB/s), io=3484MiB 
(3653MB), run=10001-10001msec

Async io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=libaio 
--iodepth=16 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
   READ: bw=1186MiB/s (1244MB/s), 1186MiB/s-1186MiB/s (1244MB/s-1244MB/s), 
io=11.6GiB (12.4GB), run=10001-10001msec
  WRITE: bw=1187MiB/s (1245MB/s), 1187MiB/s-1187MiB/s (1245MB/s-1245MB/s), 
io=11.6GiB (12.5GB), run=10001-10001msec

Mikulas


Reply via email to