On Mon, Jun 14, 2021 at 10:07:58AM +0200, Martin Pieuchot wrote:
> On 10/06/21(Thu) 19:17, Alexander Bluhm wrote:
> > Hi,
> > 
> > I have seen this crash trace on a 6.6 based system, but I think the
> > bug exists still in -current.  It happened when an ixl(4) interface
> > was removed from trunk(4).

i think it is fixed in later ixl(4). ixl_down coordinaties with the
interrupt handler with the IFF_RUNNING flag and an interrupt barrier.
6.6 ixl_intr() didn't check IFF_RUNNING, but it does now. eg:

@@ -2852,7 +3323,7 @@ ixl_rxrinfo(struct ixl_softc *sc, struct
 }
 
 static int
-ixl_intr(void *xsc)
+ixl_intr0(void *xsc)
 {
        struct ixl_softc *sc = xsc;
        struct ifnet *ifp = &sc->sc_ac.ac_if;
@@ -2873,18 +3344,63 @@ ixl_intr(void *xsc)
                rv = 1;
        }
 
-       if (ISSET(icr, I40E_INTR_NOTX_RX_MASK))
-               rv |= ixl_rxeof(sc, ifp->if_iqs[0]);
-       if (ISSET(icr, I40E_INTR_NOTX_TX_MASK))
-               rv |= ixl_txeof(sc, ifp->if_ifqs[0]);
+       if (ISSET(ifp->if_flags, IFF_RUNNING)) {
+               struct ixl_vector *iv = sc->sc_vectors;
+               if (ISSET(icr, I40E_INTR_NOTX_RX_MASK))
+                       rv |= ixl_rxeof(sc, iv->iv_rxr);
+               if (ISSET(icr, I40E_INTR_NOTX_TX_MASK))
+                       rv |= ixl_txeof(sc, iv->iv_txr);
+       }


> > ifnewlladdr() is interrupted by ixl transmit interrupt.  There it
> > crashes in ixl_txeof as txr is NULL.  The code in -current if_ixl.c
> > has changed, so it might not happen anymore.  But I think the bug
> > is in ifnewlladdr().
> 
> Hard to say.

i hate how ifnewlladdr works, but im leaning toward 6.6 ixl as the thing
to blame for this one.

> > ifnewlladdr() sets splnet() and configures the interface up and
> > down.  The ixl_down() code has some interrupt barriers which cannot
> > work while interrupts are blocked by splnet().  So interrupts fire
> > at splx() when the driver does not expect them.
> 
> If intr_barrier() or ixl_down() need a certain IPL level to properly
> work then something has been overlooked.  Should we add an assert?

intr_barrier is fine. ixl_down assumed something that wasn't present.

> > Combining interrupt barriers with spl protection looks like a bad
> > idea.
> > 
> > Is there anything that lowers spl in all cases during intr_barrier(),
> > ifq_barrier() or timeout_del_barrier()?
> > 
> > How should spls work together with barriers?
> > 
> > The integrity of ifnewlladdr() state should be guaranteed by netlock.
> >
> > Changing if_flags needs splnet() as they are used by all drivers.
> 
> This isn't clear to me.  splnet() used to be needed but nowadays this
> seems to questionable depending on the driver.

updates to if_flags needs to be serialised, but in a lot of cases
you can read if_flags without a lock because there's no inconsistent
intermediate state that the reader can see. a reader will either
see the flag set or cleared, there's nothing in between.

the trick ixl is trying to do is to ensure that the driver data the
interrupt side uses is allocated and availble when IFF_RUNNING is set.
the intr side in 6.6 wasn't checking that though.

> > The in6_ functions need netlock.  And driver SIOCSIFFLAGS ioctl
> > must not have splnet().
> 
> Why not?  This is new since the introduction of intr_barrier() or this
> is an old issue?
> 
> > Is reducing splnet() the correct aproach?

yes.

> I doubt it is possible to answer this question without defining who owns
> `if_flags' and how can it be read/written to.

NET_LOCK is what "owns" updates to if_flags.

> I'd question if splnet() is needed at all here.  Why is it here in the
> first place?  I'd guess to prevent the interrupt handler to run while 
> SIOCSIFFLAGS ioctl is being executed...  Your diff suggest something
> else...
> 
> > Index: net/if.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> > retrieving revision 1.641
> > diff -u -p -r1.641 if.c
> > --- net/if.c        25 May 2021 22:45:09 -0000      1.641
> > +++ net/if.c        10 Jun 2021 14:32:12 -0000
> > @@ -3109,6 +3109,8 @@ ifnewlladdr(struct ifnet *ifp)
> >     short up;
> >     int s;
> >  
> > +   NET_ASSERT_LOCKED();
> > +
> >     s = splnet();
> >     up = ifp->if_flags & IFF_UP;
> >  
> > @@ -3116,11 +3118,14 @@ ifnewlladdr(struct ifnet *ifp)
> >             /* go down for a moment... */
> >             ifp->if_flags &= ~IFF_UP;
> >             ifrq.ifr_flags = ifp->if_flags;
> > +           splx(s);
> >             (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifrq);
> > +           s = splnet();
> >     }
> >  
> >     ifp->if_flags |= IFF_UP;
> >     ifrq.ifr_flags = ifp->if_flags;
> > +   splx(s);
> >     (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifrq);
> >  
> >  #ifdef INET6
> > @@ -3139,11 +3144,12 @@ ifnewlladdr(struct ifnet *ifp)
> >  #endif
> >     if (!up) {
> >             /* go back down */
> > +           s = splnet();
> >             ifp->if_flags &= ~IFF_UP;
> >             ifrq.ifr_flags = ifp->if_flags;
> > +           splx(s);
> >             (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifrq);
> >     }
> > -   splx(s);
> >  }
> >  
> >  void
> > 
> 

Reply via email to