Hello Barbaros,

thank you for testing and excellent report.

</snip>

> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(ffffffff81f22e39) at panic+0xbf
> __assert(ffffffff81f96c9d,ffffffff81f85ebc,a3,ffffffff81fd252f) at 
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40

    assert indicates we attempt to sleep inside SMR section,
    which must be avoided.

> sleep_finish(ffff800025574da0,1) at sleep_finish+0x10b
> rw_enter(ffffffff822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,ffff80000520e000,ffff800025575058) at pf_test+0x1088
> ip_input_if(ffff800025575058,ffff800025575064,4,0,ffff80000520e000) at 
> ip_input_if+0xcd
> ipv4_input(ffff80000520e000,fffffd8053616700) at ipv4_input+0x39
> ether_input(ffff80000520e000,fffffd8053616700) at ether_input+0x3ad
> vport_if_enqueue(ffff80000520e000,fffffd8053616700) at vport_if_enqueue+0x19
> veb_port_input(ffff8000051c3800,fffffd806064c200,ffffffffffff,ffff800002066600)
>  at veb_port_input+0x4d2
> ether_input(ffff8000051c3800,fffffd806064c200) at ether_input+0x100
> vlan_input(ffff80000095a050,fffffd806064c200,ffff8000255752bc) at 
> vlan_input+0x23d
> ether_input(ffff80000095a050,fffffd806064c200) at ether_input+0x85
> if_input_process(ffff80000095a050,ffff800025575358) at if_input_process+0x6f
> ifiq_process(ffff80000095a460) at ifiq_process+0x69
> taskq_thread(ffff800000035080) at taskq_thread+0x100

    above is a call stack, which has done a bad thing (sleeping SMR section)

in my opinion the primary suspect is veb_port_input() which code reads as
follows:

 966 static struct mbuf *
 967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void 
*brport)
 968 {
 969         struct veb_port *p = brport;
 970         struct veb_softc *sc = p->p_veb;
 971         struct ifnet *ifp = &sc->sc_if;
 972         struct ether_header *eh;
 ...
1021         counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
1022             m->m_pkthdr.len);
1023 
1024         /* force packets into the one routing domain for pf */
1025         m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
1026 
1027 #if NBPFILTER > 0
1028         if_bpf = READ_ONCE(ifp->if_bpf);
1029         if (if_bpf != NULL) {
1030                 if (bpf_mtap_ether(if_bpf, m, 0) != 0)
1031                         goto drop;
1032         }
1033 #endif
1034 
1035         veb_span(sc, m);
1036 
1037         if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) &&
1038             veb_ip_filter(m))
1039                 goto drop;
1040 
1041         if (!ISSET(ifp->if_flags, IFF_LINK0) &&
1042             veb_vlan_filter(m))
1043                 goto drop;
1044 
1045         if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst))
1046                 goto drop;

call to veb_span() at line 1035 seems to be our guy/culprit (in my opinion):

 356         smr_read_enter();
 357         SMR_TAILQ_FOREACH(p, &sc->sc_spans.l_list, p_entry) {
 358                 ifp0 = p->p_ifp0;
 359                 if (!ISSET(ifp0->if_flags, IFF_RUNNING))
 360                         continue;
 361 
 362                 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
 363                 if (m == NULL) {
 364                         /* XXX count error */
 365                         continue;
 366                 }
 367 
 368                 if_enqueue(ifp0, m); /* XXX count error */
 369         }
 370         smr_read_leave();

loop above comes from veb_span(), which calls if_enqueue() from within
a smr section. The line 368 calls here:

2191 static int
2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m)
2193 {
2194         /*
2195          * switching an l2 packet toward a vport means pushing it
2196          * into the network stack. this function exists to make
2197          * if_vinput compat with veb calling if_enqueue.
2198          */
2199 
2200         if_vinput(ifp, m);
2201    
2202         return (0);
2203 }  

which in turn calls if_vinput() which calls further down to ipstack, and IP
stack my sleep. We must change veb_span() such calls to if_vinput() will happen
outside of SMR section.

I don't have such complex setup to use vlans and virtual ports. I'll try to
cook some diff and pass it to you for testing.

thanks again for coming back to us with report.

regards
sashan


Reply via email to