We have been pushing thin pool performance beyond what most would normally
do. In the process we have noticed an IOPS performance ceiling within a
single thin pool. Jon Bernard started an email thread about this problem
which is linked below.
To give you a brief rundown, our array can handle over 1.2m IOPS at a 4k
block size, and the performance between thick and thin device compares very
well until you reach about 200k IOPS (read or write). Beyond this it
appears that the performance is bottlenecked by a spinlock in the thin pool.
What tells us that there is much more headroom available is that we can
create 4 separate thin pools and get a aggregate of 800k IOPS across thin
devices created within them (200k per pool). Adding additional pools beyond
this gives diminishing returns.
We have tried a variety of tests to isolate the issue including:
- multiple chunk-sizes between 64k and 512k
- zero provisioning turned off and on
- filling the thin devices fully before running the tests
- using device mapper directly and bypassing LVM2
- enabling blk-mq
- kernel versions from 3.10.327 through 4.6
- IO on a single thin device, or spread across many within the same pool
Or testing centers around FIO and random reads, but have also looked at
writes and sequential IO. Here is our FIO job configuration which can
reproduce the results.
I would appreciate any ideas or thoughts you might have on this issue.
Please let me know if there is any other information I can provide. The
thread I linked to above has the lock stat output.
dm-devel mailing list