"Fatal double fault" panic

Mickaël Canévet Tue, 22 Mar 2011 04:23:30 -0700

Hi,

I have a redundant NAS made of FreeBSD + HAST + ZFS and 24TB of disks.


This morning my primary node crashed around 4:20am.

On the console I can see:

Fatal double fault
rip = 0xffffffff805e78b8
rsp = 0xffffff8485d43fc0
rbp = 0xffffff8485d44010
cpuid = 1; apic id = 12
panic: double fault
cpuid = 1
KDB: stack backstrace:
#0 0xffffffff805f4e0e at kdb_backtrace+0x5e
#1 0xffffffff805c2d07 at panic+0x187
#2 0xffffffff808ac366 at dblfault_handler+0x96
#3 0xffffffff808950bd at Xdblfault+0xad
Uptime: 4d14h7m5s
Cannot sump, Device not defined or unavailable.

The only thing I can see on my munin graphs is a strange IO activity
(disk and network over my HAST link) that starts at 3am every morning
and last about 1 hour and a half (and so until crash this morning). I
double checked my scheduled scripts and I do not do anything at that
time. So I suspect a system script to be responsible of this activity.
I'm not sure that this IO activity results in the crash, but that the
only track I have.

I don't know exactly on which mailing list to post that issue.

I can provide you munin graphs if needed (cpu, network io, disk io,
load, memory, netstat, open_files, processes, swap, vmstat,
zfs_arc_cache_hits_by_cache, zfs_arc_cache_hits_by_data_type,
zfs_arc_efficiency, zfs_arc_utilization, zfs_dmu_prefetch) for both
primary and secondary node.

Thanks a lot for your help.

Mickaël

signature.asc
Description: This is a digitally signed message part

"Fatal double fault" panic

Reply via email to