https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=290237

            Bug ID: 290237
           Summary: nvme panic during stress2 testing
           Product: Base System
           Version: 16.0-CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: [email protected]
          Reporter: [email protected]

I hit an nvme panic while running some stress2 tests that involve heavy paging
I/O.  This is the first time I'd run those tests since enabling the AMD IOMMU
on the machine, so I suspect the presence of the IOMMU caused a behavior change
in bus_dmamap_load_mem() that nvme wasn't quite prepared for.

I couldn't save a corefile because the panic left the NVMe controller in a bad
state, but basically I saw this in the syslog:

nvme0: bus_dmamap_load_mem returned 0x24!
panic: cpl cid does not match cmd cid

The busdma error is EINPROGRESS, so I suspect the IOMMU had to defer mapping
setup for some reason, but nvme isn't prepared to handle that case.  It seems
that it should either treat EINPROGRESS as non-fatal, or (more likely, assuming
it isn't prepared to handle out-of-order completion) pass BUS_DMA_NOWAIT to
bus_dmamap_load_mem().

The panic seems to be because the error-handling case for bus_dmamap_load_mem()
calls nvme_qpair_manual_complete_tracker(), but req->cmd.cid is only set up by
nvme_qpair_submit_tracker(), which won't yet have been called if the busdma
callback nvme_payload_map() hasn't executed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to