Hello,

I observed that NVMe performance is very bad when running fio on one
CPU(aarch64) in remote numa node compared with the nvme pci numa node.

Please see the test result[1] 327K vs. 34.9K.

Latency trace shows that one big difference is in iommu_dma_unmap_sg(),
1111 nsecs vs 25437 nsecs.


[1] fio test & results

1) fio test result:

- run fio on local CPU
taskset -c 0 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri 
--iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 
--filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread 
--name=test --group_reporting

IOPS: 327K
avg latency of iommu_dma_unmap_sg(): 1111 nsecs


- run fio on remote CPU
taskset -c 80 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri 
--iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 
--filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread 
--name=test --group_reporting

IOPS: 34.9K
avg latency of iommu_dma_unmap_sg(): 25437 nsecs

2) system info
[root@ampere-mtjade-04 ~]# lscpu | grep NUMA
NUMA node(s):                    2
NUMA node0 CPU(s):               0-79
NUMA node1 CPU(s):               80-159

lspci | grep NVMe
0003:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe 
SSD Controller SM981/PM981/PM983

[root@ampere-mtjade-04 ~]# cat /sys/block/nvme1n1/device/device/numa_node
0



Thanks, 
Ming

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to