Tobias Nygren <[email protected]> wrote: > > Enabling NPF. > > [ 22.6038371] panic: kernel debugging assertion > > "pserialize_not_in_read_section()" failed: file > > "/work/src/sys/kern/kern_mutex.c", line 527 [ 22.7529500] cpu0: Begin > > traceback... [ 22.7976654] 0x99deba54: netbsd:db_panic+0x14 > > [ 22.8465447] 0x99deba6c: netbsd:vpanic+0x194 [ 22.8985454] > > 0x99deba84: netbsd:__aeabi_uldivmod [ 22.9505468] 0x99debb04: > > netbsd:mutex_enter+0x5f4 [ 22.9994280] 0x99debb4c: > > netbsd:npf_table_lookup+0x134 [ 23.0597517] 0x99debb74: > <...> > > r1.29 of npf_tableset.c changed t_lock from IPL_NET to IPL_NONE. > Based on the above it looks like it needs to be at IPL_SOFTNET. > @rmind you could please have a look?
It is a bug, but only one aspect of it. Yes, the mutex can be IPL_SOFTNET, but it actually behaves more or less as IPL_NONE. The real bug is that the code path in question might block. There are a few ways to fix this: - Convert the mutex to spin-lock at IPL_NET (but it is excessive) and convert the memory allocations in that code path to KM_NOSLEEP. - Extend pserialize(9) by implementing Sleepable RCU (SRCU) or equivalent. - Sprinkle psref(9), but that is ugly and undesirable in the long-term. I have not had free time to work on a solution yet, but I hope to fix this soonish and commit with a next batch of the NPF fixes/improvements. Meanwhile, if you want to run with LOCKDEBUG until this gets fixed, then as a workaround I can suggest to comment out that assert as you are very unlikely to hit the crash condition of this bug; it can only happen when you perform NPF reload, plus you need to be unlucky enough to have the relevant mutex (used only for LPM-type tables) contended and blocking. -- Mindaugas
