Hi all,

Before the week was out, I wanted to provide an update on this issue.

Last weekend, I installed two VMs with CURRENT
(20240208-82bebc793658-268105) - one on zfs and one on ufs - and built a 
kernel with this config file:
include GENERIC
ident   THAYER-FULLDEBUG
makeoptions     DEBUG=-g
options KASAN
options DDB
options INVARIANT_SUPPORT
options INVARIANTS
options QUEUE_MACRO_DEBUG_TRASH
options WITNESS
options WITNESS_SKIPSPIN
options KGSSAPI

I'm also setting these in loader.conf:
debug.witness.watch=1
debug.witness.kdb=1
kern.kstack_pages=8

These two VMs have been running non-stop with our hdf5 workload without 
a panic for 146 hours and 122 hours, respectively. This might be good 
news, but is well within the threshold we've seen in our testing over 
the past 6 months. Given that all the debug kernel options slow things 
down significantly, these could just be taking a long while to panic.

I also have a another VM with our "standard" 14.0p5 kernel (GENERIC with 
KGSSAPI enabled) running on ufs to try to rule in or out zfs. This 
failed this morning, but not with a panic. In this case, nfs stopped 
responding. This is a failure mode we have seen in our testing, but is 
much rarer than a full panic. I intend to continue testing this to try 
to induce a panic, at which point I think we can rule out zfs as a 
potential cause.

Just so it's documented, since I started experimenting with kernel debug 
options last week, I have so far induced panics with the following:
- 13.2p9 kernel on hardware (only WITNESS enabled)
- 14.0p4 kernel on VM (only KASAN enabled)
- 13.2p9 kernel on hardware (all debug options above except KASAN)

My plan right now is to continue running my two test VMs with CURRENT to 
see if it's just taking a long time to panic. Once I have finished my 
ufs testing on the third VM, I will build a GENERIC kernel for CURRENT 
(no debug options, only KGSSAPI) and test against that to see if the 
actual debug instrumentation is interfering with reproducing this issue.

Please reach out if you have ideas or suggestions. I'll provide updates 
here when I have them.

Thanks,
Matt

Reply via email to