https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262969
Bug ID: 262969
Summary: NVMe - Resetting controller due to a timeout and
possible hot unplug
Product: Base System
Version: 13.1-STABLE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: [email protected]
Reporter: [email protected]
I am seeing very unstable NVMe on TrueNAS 12 and 13. I'm using Western Digital
Black NVMe m.2 SSD's. The exact version I'm using is WD_BLACK SN770 250GB &
2TB, firmware version 731030WD.
This is on TrueNAS version: TrueNAS-12.0-U8 / FreeBSD: 12.2-RELEASE-p12 (I
later upgraded to 13 Beta, see below)
I set hw.nvme.per_cpu_io_queues=0, and it did not fix the problem, in fact it
seems to have made it much more frequent, although I'm not 100% sure about
that, need to test again.
I also tried using the nvd driver with hw.nvme.use_nvd=0, which doesn't seem to
make a difference, however it had slightly different results in the log when
the issue happened again. See logs below, would be grateful if somebody can
help with this problem.
Mar 29 21:42:25 truenas nvme5: Resetting controller due to a timeout and
possible hot unplug.
Mar 29 21:42:25 truenas nvme5: resetting controller
Mar 29 21:42:25 truenas nvme5: failing outstanding i/o
Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:120 nsid:1 lba:1497544880
len:16
Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:120
cdw0:0
Mar 29 21:42:25 truenas nvme5: failing outstanding i/o
Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:123 nsid:1 lba:198272936 len:16
Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:123
cdw0:0
Mar 29 21:42:25 truenas nvme5: failing outstanding i/o
Mar 29 21:42:25 truenas nvme5: READ sqid:13 cid:121 nsid:1 lba:431014528 len:24
Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:13 cid:121
cdw0:0
Mar 29 21:42:25 truenas nvme5: failing outstanding i/o
Mar 29 21:42:25 truenas nvme5: READ sqid:15 cid:127 nsid:1 lba:864636432 len:8
Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:15 cid:127
cdw0:0
Mar 29 21:42:25 truenas nvme5: failing outstanding i/o
Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:126 nsid:1 lba:2445612184 len:8
Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:126
cdw0:0
Mar 29 21:42:25 truenas nvme5: failing outstanding i/o
Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:120 nsid:1 lba:430503600 len:8
Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:120
cdw0:0
Mar 29 21:42:25 truenas nvme5: failing outstanding i/o
Mar 29 21:42:25 truenas nvme5: READ sqid:18 cid:123 nsid:1 lba:1499051024 len:8
Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:123
cdw0:0
Mar 29 21:42:26 truenas nvme5: failing outstanding i/o
Mar 29 21:42:26 truenas nvme5: WRITE sqid:18 cid:124 nsid:1 lba:1990077368
len:8
Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:124
cdw0:0
Mar 29 21:42:26 truenas nvme5: failing outstanding i/o
Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:122 nsid:1 lba:1237765696 len:8
Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:122
cdw0:0
Mar 29 21:42:26 truenas nvme5: failing outstanding i/o
Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:125 nsid:1 lba:180758264 len:16
Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:125
cdw0:0
Mar 29 21:42:26 truenas nvme5: failing outstanding i/o
Mar 29 21:42:26 truenas nvme5: READ sqid:20 cid:121 nsid:1 lba:2445612192 len:8
Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:20 cid:121
cdw0:0
Mar 29 21:42:26 truenas nvd5: detached
nvme3: Resetting controller due to a timeout and possible hot unplug.
nvme3: resetting controller
nvme3: failing outstanding i/o
nvme3: READ sqid:7 cid:127 nsid:1 lba:419546528 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:7 cid:127 cdw0:0
nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0
cdw=1901c5a0 0 7 0 0 0
failing outstanding i/o
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
nvme3: READ sqid:11 cid:127 nsid:1 lba:782841288 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:127 cdw0:0
nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0
cdw=2ea935c8 0 7 0 0 0
failing outstanding i/o
nvme3: READ sqid:11 cid:123 nsid:1 lba:704576056 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:123 cdw0:0
nvme3: failing outstanding i/o
nvme3: WRITE sqid:12 cid:127 nsid:1 lba:1016402352 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:127 cdw0:0
nvme3: failing outstanding i/o
nvme3: READ sqid:12 cid:125 nsid:1 lba:1824854760 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:125 cdw0:0
nvme3: failing outstanding i/o
nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
WRITE sqid:13 cid:124 nsid:1 lba:1008638008 len:64
nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:124 cdw0:0
nvme3: failing outstanding i/o
nvme3: WRITE sqid:13 cid:125 nsid:1 lba:1008638152 len:56
nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:125 cdw0:0
nvme3: failing outstanding i/o
nvme3: READ sqid:15 cid:127 nsid:1 lba:783188688 len:8
nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0
cdw=29fefa38 0 7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c9511b0
0 7 0 0 0
ABORTED - BY REQUEST (00/07) sqid:15 cid:127 cdw0:0
nvme3: failing outstanding i/o
nvme3: WRITE sqid:15 cid:123 nsid:1 lba:1008553080 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:15 cid:123 cdw0:0
nvme3: failing outstanding i/o
nvme3: READ sqid:16 cid:124 nsid:1 lba:147012776 len:8
nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=6cc512e8 0
7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1e9838
0 3f 0 0 0
ABORTED - BY REQUEST (00/07) sqid:16 cid:124 cdw0:0
nvme3: failing outstanding i/o
nvme3: READ sqid:16 cid:127 nsid:1 lba:2881895592 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:16 cid:127 cdw0:0
nvme3: failing outstanding i/o
nvme3: READ sqid:17 cid:127 nsid:1 lba:2574392744 len:16
nvme3: ABORTED - BY REQUEST (00/07) sqid:17 cid:127 cdw0:0
nvme3: failing outstanding i/o
nvme3: READ sqid:18 cid:126 nsid:1 lba:155895056 len:8
nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1e98c8
0 37 0 0 0
ABORTED - BY REQUEST (00/07) sqid:18 cid:126 cdw0:0
nvme3: failing outstanding i/o
nvme3: READ sqid:19 cid:125 nsid:1 lba:151377120 len:8
nvme3: ABORTED - BY REQUEST (00/07) sqid:19 cid:125 cdw0:0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=2eae82d0 0
7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1d4c78
0 7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=8c33ca8 0
7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=abc63ca8 0
7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=99721da8 0
f 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=94ac510 0
7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
(nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=905d4e0 0
7 0 0 0
(nda3:nvme3:0:0:1): CAM status: CCB request completed with an error
(nda3:nvme3:0:0:1): Error 5, Retries exhausted
nda3 at nvme3 bus 0 scbus13 target 0 lun 1
nda3: <WD_BLACK SN770 2TB 731030WD 21513C800057>
s/n 21513C800057 detached
xptioctl: pass driver is not in the kernel
xptioctl: put "device pass" in your kernel config file
xptioctl: pass driver is not in the kernel
xptioctl: put "device pass" in your kernel config file
xptioctl: pass driver is not in the kernel
xptioctl: put "device pass" in your kernel config file
xptioctl: pass driver is not in the kernel
xptioctl: put "device pass" in your kernel config file
xptioctl: pass driver is not in the kernel
xptioctl: put "device pass" in your kernel config file
xptioctl: pass driver is not in the kernel
xptioctl: put "device pass" in your kernel config file
xptioctl: pass driver is not in the kernel
xptioctl: put "device pass" in your kernel config file
After Upgrading to 13.0-STABLE I see this:
nvme4: RECOVERY_START 166765538822 vs 166329346273
nvme4: Controller in fatal status, resetting
nvme4: Resetting controller due to a timeout and possible hot unplug.
nvme4: RECOVERY_WAITING
nvme4: resetting controller
nvme4: failing outstanding i/o
nvme4: READ sqid:18 cid:127 nsid:1 lba:32 len:224
nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:127 cdw0:0
nvme4: failing outstanding i/o
nvme4: READ sqid:18 cid:126 nsid:1 lba:544 len:224
nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:126 cdw0:0
nvme4: failing outstanding i/o
nvme4: READ sqid:18 cid:125 nsid:1 lba:3907028000 len:224
nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:125 cdw0:0
nvme4: failing outstanding i/o
nvme4: READ sqid:18 cid:124 nsid:1 lba:3907028512 len:224
nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:124 cdw0:0
nvd4: detached
--
You are receiving this mail because:
You are the assignee for the bug.