On Wed, Aug 10, 2022 at 12:52:06AM +0200, Hrvoje Popovski wrote:
> On 9.8.2022. 22:22, Vitaliy Makkoveev wrote:
> > Hi,
> >
> > The kernel lock within pflow_output_process() doesn't help because the
> > following sosend() has sleep points. So, at least pflow_clone_destroy()
> > should wait until pflow_output_process() finished. We should use
> > taskq_del_barrier(9) instead of task_del(9).
> >
> > Also we need to unlink dying pflow(4) interface from the stack before
> > start destruction.
> >
> > This diff should help. Please keep in mind, this diff is incomplete,
> > because it doesn't fix the race between pflowioctl() and
> > pflow_output_process(). This race is much more complicated, because we
> > need to introduce the new lock to protect `so' and take it before call
> > sosend(), but the sosend() takes netlock, which is taken before
> > pflowioctl() where we modify `so'. This introduces re-locking games to
> > pflowioctl() path, I so want to make this with separate diff, because
> > this potential panic was not triggered.
> >
>
> Hi,
>
> with this diff I'm getting this protection fault trap
>
> r620-1# ifconfig pflow0 destroy
> kernel: protection fault trap, code=0
> Stopped at sblock+0x35: movq 0x8(%rax),%rax
>
> ddb{0}> show panic
> the kernel did not panic
>
> ddb{0}> trace
> sblock(fffffd842c34d8e8,fffffd842c34da10,1) at sblock+0x35
> sosend(fffffd842c34d8e8,fffffd80cd292800,0,fffffd80a3f37c00,0,0) at
> sosend+0x163
> pflow_output_process(ffff8000008ca000) at pflow_output_process+0x67
> taskq_thread(ffff800000030100) at taskq_thread+0x100
> end trace frame: 0x0, count: -4
> ddb{0}>
>
That's strange, because after we the only timeout handlers can reschedule
pflow_output_process to run, but they have no sleep points. However the
task handler still running after taskq_del_barrier(9).
Does this help?
Index: sys/net/if_pflow.c
===================================================================
RCS file: /cvs/src/sys/net/if_pflow.c,v
retrieving revision 1.94
diff -u -p -r1.94 if_pflow.c
--- sys/net/if_pflow.c 6 Jun 2022 14:45:41 -0000 1.94
+++ sys/net/if_pflow.c 9 Aug 2022 23:39:05 -0000
@@ -274,14 +274,18 @@ pflow_clone_destroy(struct ifnet *ifp)
error = 0;
+ NET_LOCK();
+ SLIST_REMOVE(&pflowif_list, sc, pflow_softc, sc_next);
+ NET_UNLOCK();
+
if (timeout_initialized(&sc->sc_tmo))
- timeout_del(&sc->sc_tmo);
+ timeout_del_barrier(&sc->sc_tmo);
if (timeout_initialized(&sc->sc_tmo6))
- timeout_del(&sc->sc_tmo6);
+ timeout_del_barrier(&sc->sc_tmo6);
if (timeout_initialized(&sc->sc_tmo_tmpl))
- timeout_del(&sc->sc_tmo_tmpl);
+ timeout_del_barrier(&sc->sc_tmo_tmpl);
pflow_flush(sc);
- task_del(net_tq(ifp->if_index), &sc->sc_outputtask);
+ taskq_del_barrier(net_tq(ifp->if_index), &sc->sc_outputtask);
mq_purge(&sc->sc_outputqueue);
m_freem(sc->send_nam);
if (sc->so != NULL) {
@@ -293,9 +297,6 @@ pflow_clone_destroy(struct ifnet *ifp)
if (sc->sc_flowsrc != NULL)
free(sc->sc_flowsrc, M_DEVBUF, sc->sc_flowsrc->sa_len);
if_detach(ifp);
- NET_LOCK();
- SLIST_REMOVE(&pflowif_list, sc, pflow_softc, sc_next);
- NET_UNLOCK();
free(sc, M_DEVBUF, sizeof(*sc));
return (error);
}