Hi Ming,
I was running some performance test on latest 4.17-rc and figure out
performance drop (approximate 15% drop) due to below patch set.
https://marc.info/?l=linux-block&m=150802309522847&w=2
I observed drop on latest 4.16.6 stable and 4.17-rc kernel as well. Taking
bisect approach, figure out that Issue is not observed using last stable
kernel 4.14.38.
I pick 4.14.38 stable kernel as base line and applied above patch to
confirm the behavior.
lscpu output -
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Thread(s) per core: 2
Core(s) per socket: 18
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
Stepping: 4
CPU MHz: 1457.182
CPU max MHz: 2701.0000
CPU min MHz: 1200.0000
BogoMIPS: 5400.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-17,36-53
NUMA node1 CPU(s): 18-35,54-71
I am having 16 SSDs - "SDLL1DLR400GCCA1". Created two R0 VD (each VD
consist of 8 SSDs) using MegaRaid Ventura series adapter.
fio script -
numactl -N 1 fio 2vd.fio --bs=4k --iodepth=128 -rw=randread --group_report
--ioscheduler=none --numjobs=4
| v4.14.38-stable | patched
v4.14.38-stable
| mq-none | mq-none
---------------------------------------------------------------------
randread "iops" | 1597k | 1377k
Below is perf tool report without patch set. ( Looks like lock contention
is causing this drop, so provided relevant snippet)
- 3.19% 2.89% fio [kernel.vmlinux] [k]
_raw_spin_lock
- 2.43% io_submit
- 2.30% entry_SYSCALL_64
- do_syscall_64
- 2.18% do_io_submit
- 1.59% blk_finish_plug
- 1.59% blk_flush_plug_list
- 1.59% blk_mq_flush_plug_list
- 1.00% __blk_mq_delay_run_hw_queue
- 0.99% blk_mq_sched_dispatch_requests
- 0.63% blk_mq_dispatch_rq_list
0.60% scsi_queue_rq
- 0.57% blk_mq_sched_insert_requests
- 0.56% blk_mq_insert_requests
0.51% _raw_spin_lock
Below is perf tool report after applying patch set.
- 4.10% 3.51% fio [kernel.vmlinux] [k]
_raw_spin_lock
- 3.09% io_submit
- 2.97% entry_SYSCALL_64
- do_syscall_64
- 2.85% do_io_submit
- 2.35% blk_finish_plug
- 2.35% blk_flush_plug_list
- 2.35% blk_mq_flush_plug_list
- 1.83% __blk_mq_delay_run_hw_queue
- 1.83% __blk_mq_run_hw_queue
- 1.83% blk_mq_sched_dispatch_requests
- 1.82% blk_mq_do_dispatch_ctx
- 1.14% blk_mq_dequeue_from_ctx
- 1.11% dispatch_rq_from_ctx
1.03% _raw_spin_lock
0.50% blk_mq_sched_insert_requests
Let me know if you want more data or is this something a known implication
of patch-set ?
Thanks, Kashyap