On Tue, Nov 22, 2022 at 10:49:35AM +1000, David Gwynne wrote:
> On Mon, Nov 21, 2022 at 08:53:52PM +0100, Mark Kettenis wrote:
> > > Date: Mon, 21 Nov 2022 20:28:35 +0100
> > > From: Alexander Bluhm <[email protected]>
> > > 
> > > Hi,
> > > 
> > > Some of my test machines hang while booting userland.
> > > 
> > > starting network
> > > -> here it hangs
> > > load: 0.02  cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k
> > > 
> > > ddb shows these two processes.
> > > 
> > >  81303  375320  89140      0  3         0x3  sbar          ifconfig
> > >  48135  157353      0      0  3     0x14200  netlock       systqmp
> > > 
> > > ddb{0}> trace /t 0t375320
> > > sleep_finish(ffff800022d31318,1) at sleep_finish+0xfe
> > > cond_wait(ffff800022d313b0,ffffffff81f15e9d) at cond_wait+0x54
> > > sched_barrier(ffff800022512ff0) at sched_barrier+0x73
> > > ixgbe_stop(ffff800000118000) at ixgbe_stop+0x1f7
> > > ixgbe_init(ffff800000118000) at ixgbe_init+0x32
> > > ixgbe_ioctl(ffff800000118048,8020690c,ffff80000022ec00) at 
> > > ixgbe_ioctl+0x13a
> > > in_ifinit(ffff800000118048,ffff80000022ec00,ffff800022d31740,1) at 
> > > in_ifinit+0x
> > > ef
> > > in_ioctl_change_ifaddr(8040691a,ffff800022d31730,ffff800000118048,1) at 
> > > in_ioct
> > > l_change_ifaddr+0x3a4
> > > in_control(fffffd81901dc740,8040691a,ffff800022d31730,ffff800000118048) 
> > > at in_c
> > > ontrol+0x75
> > > ifioctl(fffffd81901dc740,8040691a,ffff800022d31730,ffff800022d60000) at 
> > > ifioctl
> > > +0x982
> > > sys_ioctl(ffff800022d60000,ffff800022d31840,ffff800022d318a0) at 
> > > sys_ioctl+0x2c
> > > 4
> > > syscall(ffff800022d31910) at syscall+0x384
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x7f7ffffd94a0, count: -13
> > > 
> > > ddb{0}> trace /t 0t157353
> > > sleep_finish(ffff800022ca8b70,1) at sleep_finish+0xfe
> > > rw_enter(ffffffff822b4f80,1) at rw_enter+0x1cb
> > > pf_purge(0) at pf_purge+0x1d
> > > taskq_thread(ffffffff822ac568) at taskq_thread+0x100
> > > end trace frame: 0x0, count: -4
> > > 
> > > ifconfig waits for the sched_barrier_task() on the systqmp task
> > > queue while holding the netlock.  pf_purge() runs on the systqmp
> > > task queue and is waiting for the netlock.  The netlock has been
> > > taken by ifconfig in in_ioctl_change_ifaddr().
> > > 
> > > The problem has been introduced when pf_purge() was moved from systq
> > > to systqmp.
> > > https://marc.info/?l=openbsd-cvs&m=166818274216800&w=2
> > 
> > I'd say pfpurge should be moved to itw own taskq.
> 
> we're working toward dropping the need for NET_LOCK before PF_LOCK. could
> we try the diff below as a compromise?
>

sashan@ and mvs@ have pushed that forward, so this diff should be enough
now.

Index: pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1153
diff -u -p -r1.1153 pf.c
--- pf.c        12 Nov 2022 02:48:14 -0000      1.1153
+++ pf.c        24 Nov 2022 01:21:48 -0000
@@ -1603,9 +1603,6 @@ pf_purge(void *null)
 {
        unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]);
 
-       /* XXX is NET_LOCK necessary? */
-       NET_LOCK();
-
        PF_LOCK();
 
        pf_purge_expired_src_nodes();
@@ -1616,7 +1613,6 @@ pf_purge(void *null)
         * Fragments don't require PF_LOCK(), they use their own lock.
         */
        pf_purge_expired_fragments();
-       NET_UNLOCK();
 
        /* interpret the interval as idle time between runs */
        timeout_add_sec(&pf_purge_to, interval);
@@ -1891,7 +1887,6 @@ pf_purge_expired_states(const unsigned i
        if (SLIST_EMPTY(&gcl))
                return (scanned);
 
-       NET_LOCK();
        rw_enter_write(&pf_state_list.pfs_rwl);
        PF_LOCK();
        PF_STATE_ENTER_WRITE();
@@ -1904,7 +1899,6 @@ pf_purge_expired_states(const unsigned i
        PF_STATE_EXIT_WRITE();
        PF_UNLOCK();
        rw_exit_write(&pf_state_list.pfs_rwl);
-       NET_UNLOCK();
 
        while ((st = SLIST_FIRST(&gcl)) != NULL) {
                SLIST_REMOVE_HEAD(&gcl, gc_list);

Reply via email to