On Fri, Jul 05, 2019 at 03:51:31AM +0000, Adam Steen wrote:
> >Synopsis:    Packet loss / ENOBUFs with kqueue(2) and tap(4)
> >Category:    bug
> >Environment:
>       System      : OpenBSD 6.5
>       Details     : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 
> 19:39:46 AWST 2019
>                        
> [email protected]:/sys/arch/amd64/compile/GENERIC.MP
> 
>       Architecture: OpenBSD.amd64
>       Machine     : amd64
> >Description:
>       In Solo5 we have been working towards supporting multiple network
>       interfaces, implemented this using kqueue(2) and tap(4).
> 
>       This involves setting up two Tap interfaces, starting up the program.
>       In another session flood pinging the first Tap interface,
>       Solo5 handles this with no packets dropped.
>       In another session ping the second Tap interface, then for every
>       ping to the second interface a packet is dropped on the first. If you
>       switch to a flood ping on the second tab interface, you will observe
>       massive packet loss on both interfaces, and ping complaining about
>       No buffer space available (ENOBUFS).
> 
>       see https://github.com/Solo5/solo5/issues/374 for more information.
>       
> >How-To-Repeat:
>       I have been able to reproduct this in a hacked up exampled program,
>       available here https://github.com/adamsteen/test_net_2if. Please note
>       this is hacked, generally butchered program, which demonstrates the
>       problem. (if required i can try and clean up this test case)
> 
>       01. git clone https://github.com/adamsteen/test_net_2if
>       02. cd test_net_2if
>       03. make
>       04. doas setup.sh (Setup up the Tap interfaces)
>       05. doas ./test_net_2if
>       06. in another seesion start a flood ping
>           doas ping -f 10.0.0.2
>       07. Observe that the flood ping is functioning correctly,
>           with no packets dropped.
>       08. In another session, start a normal ping
>           ping 10.1.0.2
>       09. Observe that, for each ping sent to service1, a packet is dropped.
>       10. Kill the normal ping
>       11. start a flood ping
>           doas ping -f 10.1.0.2
>       12. Observe massive packet loss on both interfaces, and ping
>           complaining about No buffer space available (ENOBUFS).
> >Fix:
>       Not Known.

Hi Adam,

claudio@ and I looked at this during a2k20, and came to the conclusion
that the packet loss occurred because an interface queue filled up
and it was shedding load. It was annoyingly easy to get to that point
though.

We also spent a lot of time massaging the tun/tap code to try and unify
the semantics of tun and tap going through the network stack, and in
particular tried to avoid queuing packets until we finally get to the
output side of the stack.

I'm not saying we've fixed this problem for you, but hopefully we've
mitigated it a bit. Could you try again and let us know if you see any
difference? If there's no difference, could you tweak your test to loop
on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or
whatever the errno is that means there's no packet to read right now?

Cheers,
dlg

Reply via email to