On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote:
> I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
> My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
> Also I've used 7.1 for a limited time and there were no crash.
> After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.
For me this looks like a bug in veb(4).
> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(ffffffff81f22e39) at panic+0xbf
> __assert(ffffffff81f96c9d,ffffffff81f85ebc,a3,ffffffff81fd252f) at
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40
> sleep_finish(ffff800025574da0,1) at sleep_finish+0x10b
> rw_enter(ffffffff822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,ffff80000520e000,ffff800025575058) at pf_test+0x1088
> ip_input_if(ffff800025575058,ffff800025575064,4,0,ffff80000520e000) at
> ip_input_if+0xcd
> ipv4_input(ffff80000520e000,fffffd8053616700) at ipv4_input+0x39
> ether_input(ffff80000520e000,fffffd8053616700) at ether_input+0x3ad
> vport_if_enqueue(ffff80000520e000,fffffd8053616700) at vport_if_enqueue+0x19
> veb_port_input(ffff8000051c3800,fffffd806064c200,ffffffffffff,ffff800002066600)
> at veb_port_input+0x4d2
> ether_input(ffff8000051c3800,fffffd806064c200) at ether_input+0x100
> vlan_input(ffff80000095a050,fffffd806064c200,ffff8000255752bc) at
> vlan_input+0x23d
> ether_input(ffff80000095a050,fffffd806064c200) at ether_input+0x85
> if_input_process(ffff80000095a050,ffff800025575358) at if_input_process+0x6f
> ifiq_process(ffff80000095a460) at ifiq_process+0x69
> taskq_thread(ffff800000035080) at taskq_thread+0x100
veb_port_input -> veb_broadcast -> smr_read_enter; tp->p_enqueue
-> vport_if_enqueue -> if_vinput -> ifp->if_input -> ether_input ->
ipv4_input -> ip_input_if -> pf_test -> PF_LOCK -> rw_enter_write()
After calling smr_read_enter sleeping is not allowed according to
man page. pf sleeps because it uses a read write lock. I looks
like we have some contention on the pf lock. With more forwarding
threads, sleep in pf is more likely.
> __mp_lock(ffffffff823d986c) at __mp_lock+0x72
> wakeup_n(ffffffff822cfe50,ffffffff) at wakeup_n+0x32
> pf_test(2,2,ffff800000948050,ffff80002557b300) at pf_test+0x11f6
> pf_route(ffff80002557b388,fffffd89fb379938) at pf_route+0x1f6
> pf_test(2,1,ffff800000924050,ffff80002557b598) at pf_test+0xa1f
> ip_input_if(ffff80002557b598,ffff80002557b5a4,4,0,ffff800000924050) at
> ip_input_if+0xcd
> ipv4_input(ffff800000924050,fffffd8053540f00) at ipv4_input+0x39
> ether_input(ffff800000924050,fffffd8053540f00) at ether_input+0x3ad
> if_input_process(ffff800000924050,ffff80002557b688) at if_input_process+0x6f
> ifiq_process(ffff800000926500) at ifiq_process+0x69
> taskq_thread(ffff800000035100) at taskq_thread+0x100
> __mp_lock(ffffffff823d986c) at __mp_lock+0x72
> wakeup_n(ffffffff822cfe50,ffffffff) at wakeup_n+0x32
> pf_test(2,2,ffff800000948050,ffff80002557b300) at pf_test+0x11f6
> pf_route(ffff80002557b388,fffffd89fb379938) at pf_route+0x1f6
> pf_test(2,1,ffff800000924050,ffff80002557b598) at pf_test+0xa1f
> ip_input_if(ffff80002557b598,ffff80002557b5a4,4,0,ffff800000924050) at
> ip_input_if+0xcd
> ipv4_input(ffff800000924050,fffffd8053540f00) at ipv4_input+0x39
> ether_input(ffff800000924050,fffffd8053540f00) at ether_input+0x3ad
> if_input_process(ffff800000924050,ffff80002557b688) at if_input_process+0x6f
> ifiq_process(ffff800000926500) at ifiq_process+0x69
> taskq_thread(ffff800000035100) at taskq_thread+0x100
Can some veb or smr hacker explain how this is supposed to work?
Sleeping in pf is also not ideal as it is in the hot path and slows
down packets. But that is not easy to fix as we have to refactor
the memory allocations before converting pf lock to a mutex. sashan@
is working on that.
bluhm