On Fri Feb 14 20, Robin Murphy wrote:
Hi Jerry,

On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:
Hi Will,

On a gigabyte system with Cavium CN8xx, when doing a fio test against
an nvme drive we are seeing the following:

[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000036000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000018000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000360000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010000ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x80100003e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7

Those "IOVAs" don't look much like IOVAs from the DMA allocator - if they were physical addresses, would they correspond to an expected region of the physical memory map?

I would suspect that this is most likely misbehaviour in the NVMe driver (issuing a write to a non-DMA-mapped address), and the SMMU is just doing its job in blocking and reporting it.

I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I couldn't narrow it down further into 5.4-rc1. I don't know smmu or the code well, any thoughts on where to start digging into this?

fio test that is being run is:

#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting -name=mytest -numjobs=32

Just to clarify, do other tests work OK on the same device?

Thanks,
Robin.


I was able to get back on the system today. I think I know what the problem is:

[    0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu passthrough 
mode
[    6.324282] iommu: Default domain type: Translated

So the new default domain code in 5.4 overrides the iommu quirk code setting 
default
passthrough. Testing a quick patch that tracks whether the default domain was 
set
in the quirk code, and leaves it alone if it was. So far it seems to be working.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to