Hi Bart, Sagi and all,
By current email I would like to share some fresh RDMA performance
results of IBNBD, SCST and NVMEof, based on 4.10 kernel and variety
of configurations.
All fio runs are grouped by the name of a project, crucial config
differencies (e.g. CPU pinning or register_always=N) and two testing
modes: MANY-DISKS and MANY-JOBS. In each group of results amount of
simultaneous fio jobs is increasing starting from 1 up to 128. E.g.
in MANY-DISKS testing mode 1 fio job is dedicated to 1 disk, where
amount of jobs (and disks) is growing, in its turn, in MANY-JOBS
testing mode each fio job produces IO for the same disk, i.e.:
MANY-DISKS:
x1:
numjobs=1
[job1]
filename=/dev/nvme0n1
...
x128:
numjobs=1
[job1]
filename=/dev/nvme0n1
[job2]
filename=/dev/nvme0n2
...
[job128]
filename=/dev/nvme0n128
MANY-JOBS:
x1:
numjobs=1
[job1]
filename=/dev/nvme0n1
...
x128:
numjobs=128
[job1]
filename=/dev/nvme0n1
Each group of results represents itself as a performance measurement,
which can be easily plotted, taking number of jobs as X axis and iops,
overall IO latencies or anything else extracted from fio json result
files as Y axis.
FIO configurations were generated and saved along with produced fio
json results by the fio-runner.py script [1]. Complete archive with
FIO configs and results can be downloaded here [2].
The following metrics were taken from fio json results:
write/iops - IOPS
write/lat/mean - average latency (μs)
Here I would like to present reduced results table taking into account
only runs with CPU pinning in MANY-DISKS testing mode, since CPU pinning
makes more sense in terms of performance and MANY-DISKS and MANY-JOBS
results look very much similar:
write/iops (MANY-DISKS)
IBNBD_pin NVME_noreg_pin NVME_pin SCST_noreg_pin SCST_pin
x1 80398.96 75577.24 54110.19 59555.04 48446.05
x2 109018.60 96478.45 69176.77 73925.81 55557.59
x4 169164.56 140558.75 93700.96 75419.91 56294.61
x8 197725.44 159887.33 99773.05 79750.92 55938.84
x16 176782.36 150448.33 99644.05 92964.23 56463.14
x32 139666.00 123198.38 81845.30 81287.98 50590.86
x64 125666.16 82231.77 72117.67 72023.32 45121.17
x128 120253.63 73911.97 65665.08 74642.27 47268.46
write/lat/mean (MANY-DISKS)
IBNBD_pin NVME_noreg_pin NVME_pin SCST_noreg_pin SCST_pin
x1 647.78 697.91 1032.97 925.51 1173.04
x2 973.20 1104.38 1612.75 1462.18 2047.11
x4 1279.49 1528.09 2452.22 3188.41 4235.95
x8 2356.92 2929.87 4891.70 6248.85 8907.10
x16 5605.62 6575.70 10046.4 10830.50 17945.57
x32 14489.54 16516.60 24849.16 24984.26 40335.09
x64 32364.39 49481.42 56615.23 56559.02 90590.84
x128 67570.88 110768.70 124249.4 109321.84 171390.00
* Where suffixes mean:
_pin - CPU pinning
_noreg - modules on initiator side (ib_srp, nvme_rdma) were loaded
with 'register_always=N' param
Complete table results and corresponding graphs are presented on Google
sheet [3].
Conclusion:
IBNBD outperforms in average by:
NVME_noreg_pin NVME_pin SCST_noreg_pin SCST_pin
iops 41% 72% 61% 155%
lat/mean 28% 42% 38% 60%
* Complete tables results [3] were taken into account for average
percentage calculation.
Test setup is the following:
Initiator and target HW configuration:
AMD Opteron 6386 SE, 64CPU, 128Gb
InfiniBand: Mellanox Technologies MT26428
[ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
Initiator and target SW configuration:
vanilla Linux 4.10
+ IBNBD patches
+ SCST from https://github.com/bvanassche/scst, master branch
Initiator side:
IBNBD and NVME: MQ mode
SRP: default RQ, on attempt to set 'use_blk_mq=Y' IO hangs.
FIO generic configuration pattern:
bssplit=512/20:1k/16:2k/9:4k/12:8k/19:16k/10:32k/8:64k/4
fadvise_hint=0
rw=randrw:2
direct=1
random_distribution=zipf:1.2
time_based=1
runtime=10
ioengine=libaio
iodepth=128
iodepth_batch_submit=128
iodepth_batch_complete=128
group_reporting
Target side:
128 null_blk devices with default configuration, opened as blockio.
NVMEoF configuration script [4].
SCST configuration script [5].
Would be great to receive any feedback. I am open for further perf
tuning and testing with other possible configurations and options.
Thanks.
--
Roman
[1] FIO runner and results extractor script:
https://drive.google.com/open?id=0B8_SivzwHdgSS2RKcmc4bWg0YjA
[2] Archive with FIO configurations and results:
https://drive.google.com/open?id=0B8_SivzwHdgSaDlhMXV6THhoRXc
[3] Google sheet with performance measurements:
https://drive.google.com/open?id=1sCTBKLA5gbhhkgd2USZXY43VL3zLidzdqDeObZn9Edc
[4] NVMEoF configuration:
https://drive.google.com/open?id=0B8_SivzwHdgSTzRjbGtmaVR6LWM
[5] SCST configuration:
https://drive.google.com/open?id=0B8_SivzwHdgSM1B5eGpKWmFJMFk