Hi, I have a redundant NAS made of FreeBSD + HAST + ZFS and 24TB of disks.
This morning my primary node crashed around 4:20am. On the console I can see: Fatal double fault rip = 0xffffffff805e78b8 rsp = 0xffffff8485d43fc0 rbp = 0xffffff8485d44010 cpuid = 1; apic id = 12 panic: double fault cpuid = 1 KDB: stack backstrace: #0 0xffffffff805f4e0e at kdb_backtrace+0x5e #1 0xffffffff805c2d07 at panic+0x187 #2 0xffffffff808ac366 at dblfault_handler+0x96 #3 0xffffffff808950bd at Xdblfault+0xad Uptime: 4d14h7m5s Cannot sump, Device not defined or unavailable. The only thing I can see on my munin graphs is a strange IO activity (disk and network over my HAST link) that starts at 3am every morning and last about 1 hour and a half (and so until crash this morning). I double checked my scheduled scripts and I do not do anything at that time. So I suspect a system script to be responsible of this activity. I'm not sure that this IO activity results in the crash, but that the only track I have. I don't know exactly on which mailing list to post that issue. I can provide you munin graphs if needed (cpu, network io, disk io, load, memory, netstat, open_files, processes, swap, vmstat, zfs_arc_cache_hits_by_cache, zfs_arc_cache_hits_by_data_type, zfs_arc_efficiency, zfs_arc_utilization, zfs_dmu_prefetch) for both primary and secondary node. Thanks a lot for your help. Mickaël
signature.asc
Description: This is a digitally signed message part
