Might be an overheating. Today's nvme drives are notoriously flaky if you
run them without proper heat sink attached to it.

-Max



On Mon, Feb 12, 2024, 4:28 PM Don Lewis <truck...@freebsd.org> wrote:

> I just upgraded my package build machine to:
>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
> from:
>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
> and I've had two nvme-triggered panics in the last day.
>
> nvme is being used for swap and L2ARC.  I'm not able to get a crash
> dump, probably because the nvme device has gone away and I get an error
> about not having a dump device.  It looks like a low-memory panic
> because free memory is low and zfs is calling malloc().
>
> This shows up in the log leading up to the panic:
> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
> timeout a
> nd possible hot unplug.
> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
> timeout a
> nd possible hot unplug.
> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog
> ti
> meout.
>
> The device looks healthy to me:
> SMART/Health Information Log
> ============================
> Critical Warning State:         0x00
>  Available spare:               0
>  Temperature:                   0
>  Device reliability:            0
>  Read only:                     0
>  Volatile memory backup:        0
> Temperature:                    312 K, 38.85 C, 101.93 F
> Available spare:                100
> Available spare threshold:      10
> Percentage used:                3
> Data units (512,000 byte) read: 5761183
> Data units written:             29911502
> Host read commands:             471921188
> Host write commands:            605394753
> Controller busy time (minutes): 32359
> Power cycles:                   110
> Power on hours:                 19297
> Unsafe shutdowns:               14
> Media errors:                   0
> No. error info log entries:     0
> Warning Temp Composite Time:    0
> Error Temp Composite Time:      0
> Temperature 1 Transition Count: 5231
> Temperature 2 Transition Count: 0
> Total Time For Temperature 1:   41213
> Total Time For Temperature 2:   0
>
>
>

Reply via email to