On 12 Feb, Mark Johnston wrote:
> On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote:
>> I just upgraded my package build machine to:
>>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
>> from:
>>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
>> and I've had two nvme-triggered panics in the last day.
>> 
>> nvme is being used for swap and L2ARC.  I'm not able to get a crash
>> dump, probably because the nvme device has gone away and I get an error
>> about not having a dump device.  It looks like a low-memory panic
>> because free memory is low and zfs is calling malloc().
>> 
>> This shows up in the log leading up to the panic:
>> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
>> nd possible hot unplug.
>> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
>> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
>> nd possible hot unplug.
>> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
>> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
>> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
>> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti
>> meout.
> 
> Are you by chance using the drive mentioned here? 
> https://github.com/openzfs/zfs/discussions/14793
> 
> I was bitten by that and ended up replacing the drive with a different
> model.  The crash manifested exactly as you describe, though I didn't
> have L2ARC or swap enabled on it.

Nope:
nda0 at nvme0 bus 0 scbus9 target 0 lun 1
nda0: <INTEL SSDPEKNW512G8 002C BTNH940617WE512A>
nda0: Serial Number BTNH940617WE512A
nda0: nvme version 1.3
nda0: 488386MB (1000215216 512 byte sectors)

I'm not seeing super high I/O rates>  I happened to have iostat running
when the machine paniced:
   0   584 88.4    31  2.68 65.8   112  7.18 68.2   107  7.13  80  0 20  0  0
   0   565 99.1    32  3.06 27.9    74  2.01 30.5    70  2.08  80  0 20  0  0
   0   612 92.8    31  2.77 18.9   148  2.74 18.9   148  2.73  86  0 14  0  0
   0   618 88.6    13  1.17 25.0    59  1.44 24.2    61  1.44  89  0 11  0  0
   0   586 45.4     5  0.22 31.4    55  1.70 30.8    57  1.70  84  0 16  0  0
   0   598 12.7     3  0.03 38.1    64  2.40 37.1    66  2.40  84  0 16  0  0
   0   675 36.1     6  0.21 23.7   156  3.62 22.7   164  3.63  88  0 12  0  0
   0   641  6.9     6  0.04 25.7   243  6.10 25.3   246  6.08  71  0 29  0  0
   0   737 20.1     9  0.18 36.4   148  5.24 37.2   144  5.24  78  0 22  0  0
   0   578 44.7    23  1.03 25.1   164  4.01 25.5   161  3.99  86  0 14  0  0
   0   608 70.3    15  1.06 51.1    64  3.19 51.3    64  3.19  89  0 11  0  0
   0   624 38.6     9  0.35 32.3   121  3.80 32.2   121  3.79  90  0 10  0  0
   0   577 80.6    16  1.28 37.8    66  2.44 36.5    69  2.46  90  0 10  0  0
       tty             nda0             ada0             ada1             cpu
 tin  tout KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s  us ni sy in id
   0   566 87.7    16  1.39 27.2    60  1.60 25.3    66  1.62  87  0 13  0  0
   0   599 77.2    11  0.83 17.4   391  6.66 17.3   395  6.66  74  0 26  0  0
   0   660 45.0     7  0.31 18.7   575 10.51 18.6   578 10.49  76  0 24  0  0
   0   615 37.7     8  0.31 24.0   303  7.11 24.0   303  7.11  58  0 42  0  0
Fssh_packet_write_wait: ... port 22: Broken pipe
ada* are old and slow spinning rust.


That report does mention something else that could also be a cause.  I
upgraded the motherboard BIOS around the same time.  When I get a
chance, I'll drop back to the older FreeBSD version and see if the
problem goes away.


Reply via email to