On Tue, Nov 22, 2022 at 10:49:35AM +1000, David Gwynne wrote:
> On Mon, Nov 21, 2022 at 08:53:52PM +0100, Mark Kettenis wrote:
> > > Date: Mon, 21 Nov 2022 20:28:35 +0100
> > > From: Alexander Bluhm <[email protected]>
> > >
> > > Hi,
> > >
> > > Some of my test machines hang while booting userland.
> > >
> > > starting network
> > > -> here it hangs
> > > load: 0.02 cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k
> > >
> > > ddb shows these two processes.
> > >
> > > 81303 375320 89140 0 3 0x3 sbar ifconfig
> > > 48135 157353 0 0 3 0x14200 netlock systqmp
> > >
> > > ddb{0}> trace /t 0t375320
> > > sleep_finish(ffff800022d31318,1) at sleep_finish+0xfe
> > > cond_wait(ffff800022d313b0,ffffffff81f15e9d) at cond_wait+0x54
> > > sched_barrier(ffff800022512ff0) at sched_barrier+0x73
> > > ixgbe_stop(ffff800000118000) at ixgbe_stop+0x1f7
> > > ixgbe_init(ffff800000118000) at ixgbe_init+0x32
> > > ixgbe_ioctl(ffff800000118048,8020690c,ffff80000022ec00) at
> > > ixgbe_ioctl+0x13a
> > > in_ifinit(ffff800000118048,ffff80000022ec00,ffff800022d31740,1) at
> > > in_ifinit+0x
> > > ef
> > > in_ioctl_change_ifaddr(8040691a,ffff800022d31730,ffff800000118048,1) at
> > > in_ioct
> > > l_change_ifaddr+0x3a4
> > > in_control(fffffd81901dc740,8040691a,ffff800022d31730,ffff800000118048)
> > > at in_c
> > > ontrol+0x75
> > > ifioctl(fffffd81901dc740,8040691a,ffff800022d31730,ffff800022d60000) at
> > > ifioctl
> > > +0x982
> > > sys_ioctl(ffff800022d60000,ffff800022d31840,ffff800022d318a0) at
> > > sys_ioctl+0x2c
> > > 4
> > > syscall(ffff800022d31910) at syscall+0x384
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x7f7ffffd94a0, count: -13
> > >
> > > ddb{0}> trace /t 0t157353
> > > sleep_finish(ffff800022ca8b70,1) at sleep_finish+0xfe
> > > rw_enter(ffffffff822b4f80,1) at rw_enter+0x1cb
> > > pf_purge(0) at pf_purge+0x1d
> > > taskq_thread(ffffffff822ac568) at taskq_thread+0x100
> > > end trace frame: 0x0, count: -4
> > >
> > > ifconfig waits for the sched_barrier_task() on the systqmp task
> > > queue while holding the netlock. pf_purge() runs on the systqmp
> > > task queue and is waiting for the netlock. The netlock has been
> > > taken by ifconfig in in_ioctl_change_ifaddr().
> > >
> > > The problem has been introduced when pf_purge() was moved from systq
> > > to systqmp.
> > > https://marc.info/?l=openbsd-cvs&m=166818274216800&w=2
> >
> > I'd say pfpurge should be moved to itw own taskq.
>
> we're working toward dropping the need for NET_LOCK before PF_LOCK. could
> we try the diff below as a compromise?
>
sashan@ and mvs@ have pushed that forward, so this diff should be enough
now.
Index: pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1153
diff -u -p -r1.1153 pf.c
--- pf.c 12 Nov 2022 02:48:14 -0000 1.1153
+++ pf.c 24 Nov 2022 01:21:48 -0000
@@ -1603,9 +1603,6 @@ pf_purge(void *null)
{
unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]);
- /* XXX is NET_LOCK necessary? */
- NET_LOCK();
-
PF_LOCK();
pf_purge_expired_src_nodes();
@@ -1616,7 +1613,6 @@ pf_purge(void *null)
* Fragments don't require PF_LOCK(), they use their own lock.
*/
pf_purge_expired_fragments();
- NET_UNLOCK();
/* interpret the interval as idle time between runs */
timeout_add_sec(&pf_purge_to, interval);
@@ -1891,7 +1887,6 @@ pf_purge_expired_states(const unsigned i
if (SLIST_EMPTY(&gcl))
return (scanned);
- NET_LOCK();
rw_enter_write(&pf_state_list.pfs_rwl);
PF_LOCK();
PF_STATE_ENTER_WRITE();
@@ -1904,7 +1899,6 @@ pf_purge_expired_states(const unsigned i
PF_STATE_EXIT_WRITE();
PF_UNLOCK();
rw_exit_write(&pf_state_list.pfs_rwl);
- NET_UNLOCK();
while ((st = SLIST_FIRST(&gcl)) != NULL) {
SLIST_REMOVE_HEAD(&gcl, gc_list);