https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=290237
Bug ID: 290237
Summary: nvme panic during stress2 testing
Product: Base System
Version: 16.0-CURRENT
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: [email protected]
Reporter: [email protected]
I hit an nvme panic while running some stress2 tests that involve heavy paging
I/O. This is the first time I'd run those tests since enabling the AMD IOMMU
on the machine, so I suspect the presence of the IOMMU caused a behavior change
in bus_dmamap_load_mem() that nvme wasn't quite prepared for.
I couldn't save a corefile because the panic left the NVMe controller in a bad
state, but basically I saw this in the syslog:
nvme0: bus_dmamap_load_mem returned 0x24!
panic: cpl cid does not match cmd cid
The busdma error is EINPROGRESS, so I suspect the IOMMU had to defer mapping
setup for some reason, but nvme isn't prepared to handle that case. It seems
that it should either treat EINPROGRESS as non-fatal, or (more likely, assuming
it isn't prepared to handle out-of-order completion) pass BUS_DMA_NOWAIT to
bus_dmamap_load_mem().
The panic seems to be because the error-handling case for bus_dmamap_load_mem()
calls nvme_qpair_manual_complete_tracker(), but req->cmd.cid is only set up by
nvme_qpair_submit_tracker(), which won't yet have been called if the busdma
callback nvme_payload_map() hasn't executed.
--
You are receiving this mail because:
You are the assignee for the bug.