On Wed, 12 Mar 2025, Ming Lei wrote:
> > > It isn't perfect, sometime it may be slower than running on io-wq
> > > directly.
> > >
> > > But is there any better way for covering everything?
> >
> > Yes - fix the loop queue workers.
>
> What you suggested is threaded aio by submitting IO concurrently from
> different task context, this way is not the most efficient one, otherwise
> modern language won't invent async/.await.
>
> In my test VM, by running Mikulas's fio script on loop/nvme by the attached
> threaded_aio patch:
>
> NOWAIT with MQ 4 : 70K iops(read), 70K iops(write), cpu util:
> 40%
> threaded_aio with MQ 4 : 64k iops(read), 64K iops(write), cpu
> util: 52%
> in tree loop(SQ) : 58K iops(read), 58K iops(write)
>
> Mikulas, please feel free to run your tests with threaded_aio:
>
> modprobe loop nr_hw_queues=4 threaded_aio=1
>
> by applying the attached the patch over the loop patchset.
>
> The performance gap could be more obvious in fast hardware.
With "threaded_aio=1":
Sync io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=psync
--iodepth=1 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
READ: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=3001MiB
(3147MB), run=10001-10001msec
WRITE: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=3004MiB
(3149MB), run=10001-10001msec
Async io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=libaio
--iodepth=16 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
READ: bw=869MiB/s (911MB/s), 869MiB/s-869MiB/s (911MB/s-911MB/s), io=8694MiB
(9116MB), run=10002-10002msec
WRITE: bw=870MiB/s (913MB/s), 870MiB/s-870MiB/s (913MB/s-913MB/s), io=8706MiB
(9129MB), run=10002-10002msec
Without "threaded_aio=1":
Sync io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=psync
--iodepth=1 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
READ: bw=348MiB/s (365MB/s), 348MiB/s-348MiB/s (365MB/s-365MB/s), io=3481MiB
(3650MB), run=10001-10001msec
WRITE: bw=348MiB/s (365MB/s), 348MiB/s-348MiB/s (365MB/s-365MB/s), io=3484MiB
(3653MB), run=10001-10001msec
Async io
fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=libaio
--iodepth=16 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw
xfs/loop/xfs
READ: bw=1186MiB/s (1244MB/s), 1186MiB/s-1186MiB/s (1244MB/s-1244MB/s),
io=11.6GiB (12.4GB), run=10001-10001msec
WRITE: bw=1187MiB/s (1245MB/s), 1187MiB/s-1187MiB/s (1245MB/s-1245MB/s),
io=11.6GiB (12.5GB), run=10001-10001msec
Mikulas