XFS/ext4/others all need to lock the inode for buffered writes. Since
io_uring handles any IO in an async manner, this means that for higher
queue depth buffered write workloads, we have a lot of workers
hammering on the same mutex.

Running a QD=32 random write workload on my test box yields about 200K
4k random write IOPS with io_uring. Looking at system profiles, we're
spending about half the time contending on the inode mutex. Oof.

For buffered writes, we don't necessarily need a huge amount of threads
issuing that IO. If we instead rely on normal flushing to take care of
getting the parallelism we need on the device side, we can limit
ourselves to a much lower depth. This still gets us async behavior on
the submission side.

With this small series, my 200K IOPS goes to 370K IOPS for the same
workload.

This issue came out of postgres implementing io_uring support, and
reporting some of the issues they saw.

-- 
Jens Axboe


Reply via email to