Re: lo(4) checksum offload

2023-06-05 Thread David Gwynne



> On 2 Jun 2023, at 19:51, Alexander Bluhm  wrote:
> 
> On Wed, May 31, 2023 at 09:36:22AM +1000, David Gwynne wrote:
>> we could export these csum flags as part of the bpf header so we can
>> teach tcpdump to shut up in this situation.
> 
> Linux does not do that and I want to keep pcap file format compatible.
> Also smart tools make debugging real checksum problems harder.  I
> think people have to live with checksum warning when tcpdump is
> called with -v.
> 
>>> The question is, does it break corner cases?  Please test with pf
>>> route-to, IPsec, bridging, IPv6 and other setups where loopback
>>> might be involved.
>> 
>> who knows how bridge works? otherwise i think it should be ok.
>> hopefully.
> 
> No reply from any testers.  I think the only way to figure out is
> to commit an watch for fallout.
> 
> ok?

agreed. ok by me.



Re: lo(4) checksum offload

2023-06-02 Thread Alexander Bluhm
On Wed, May 31, 2023 at 09:36:22AM +1000, David Gwynne wrote:
> we could export these csum flags as part of the bpf header so we can
> teach tcpdump to shut up in this situation.

Linux does not do that and I want to keep pcap file format compatible.
Also smart tools make debugging real checksum problems harder.  I
think people have to live with checksum warning when tcpdump is
called with -v.

> > The question is, does it break corner cases?  Please test with pf
> > route-to, IPsec, bridging, IPv6 and other setups where loopback
> > might be involved.
> 
> who knows how bridge works? otherwise i think it should be ok.
> hopefully.

No reply from any testers.  I think the only way to figure out is
to commit an watch for fallout.

ok?

bluhm

> > Index: net/if.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> > retrieving revision 1.697
> > diff -u -p -r1.697 if.c
> > --- net/if.c16 May 2023 14:32:54 -  1.697
> > +++ net/if.c30 May 2023 12:33:42 -
> > @@ -778,7 +778,7 @@ if_input(struct ifnet *ifp, struct mbuf_
> >  int
> >  if_input_local(struct ifnet *ifp, struct mbuf *m, sa_family_t af)
> >  {
> > -   int keepflags;
> > +   int keepflags, keepcksum;
> >  
> >  #if NBPFILTER > 0
> > /*
> > @@ -796,11 +796,26 @@ if_input_local(struct ifnet *ifp, struct
> > }
> >  #endif
> > keepflags = m->m_flags & (M_BCAST|M_MCAST);
> > +   /*
> > +* Preserve outgoing checksum flags, in case the packet is
> > +* forwarded to another interface.  Then the checksum, which
> > +* is now incorrect, will be calculated before sending.
> > +*/
> > +   keepcksum = m->m_pkthdr.csum_flags & (M_IPV4_CSUM_OUT |
> > +   M_TCP_CSUM_OUT | M_UDP_CSUM_OUT | M_ICMP_CSUM_OUT);
> > m_resethdr(m);
> > m->m_flags |= M_LOOP | keepflags;
> > +   m->m_pkthdr.csum_flags = keepcksum;
> > m->m_pkthdr.ph_ifidx = ifp->if_index;
> > m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
> >  
> > +   if (ISSET(keepcksum, M_TCP_CSUM_OUT))
> > +   m->m_pkthdr.csum_flags |= M_TCP_CSUM_IN_OK;
> > +   if (ISSET(keepcksum, M_UDP_CSUM_OUT))
> > +   m->m_pkthdr.csum_flags |= M_UDP_CSUM_IN_OK;
> > +   if (ISSET(keepcksum, M_ICMP_CSUM_OUT))
> > +   m->m_pkthdr.csum_flags |= M_ICMP_CSUM_IN_OK;
> > +
> > ifp->if_opackets++;
> > ifp->if_obytes += m->m_pkthdr.len;
> >  
> > @@ -809,6 +824,8 @@ if_input_local(struct ifnet *ifp, struct
> >  
> > switch (af) {
> > case AF_INET:
> > +   if (ISSET(keepcksum, M_IPV4_CSUM_OUT))
> > +   m->m_pkthdr.csum_flags |= M_IPV4_CSUM_IN_OK;
> > ipv4_input(ifp, m);
> > break;
> >  #ifdef INET6
> > Index: net/if_loop.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_loop.c,v
> > retrieving revision 1.93
> > diff -u -p -r1.93 if_loop.c
> > --- net/if_loop.c   21 Oct 2022 14:20:03 -  1.93
> > +++ net/if_loop.c   30 May 2023 11:48:55 -
> > @@ -173,6 +173,9 @@ loop_clone_create(struct if_clone *ifc, 
> > ifp->if_mtu = LOMTU;
> > ifp->if_flags = IFF_LOOPBACK | IFF_MULTICAST;
> > ifp->if_xflags = IFXF_CLONED;
> > +   ifp->if_capabilities = IFCAP_CSUM_IPv4 |
> > +   IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4 |
> > +   IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6;
> > ifp->if_rtrequest = lortrequest;
> > ifp->if_ioctl = loioctl;
> > ifp->if_input = loinput;
> > 



Re: lo(4) checksum offload

2023-06-01 Thread Alexander Bluhm
On Thu, Jun 01, 2023 at 04:38:53PM +, Peter Stuge wrote:
> David Gwynne wrote:
> > > Currently packets sent over loopback interface get their checksum
> > > calculated twice.  In the output path it is set and during TCP/IP
> > > input it is calculated again to be compared with the previous value.
> ..
> > > The question is, does it break corner cases?  Please test with pf
> > > route-to, IPsec, bridging, IPv6 and other setups where loopback
> > > might be involved.
> > 
> > who knows how bridge works? otherwise i think it should be ok.
> > hopefully.
> 
> Would a middle ground be safer? To only calculate checksum on output?

Either it works or not.  If not, we can fix the bugs.  Have you
tested and discovered a failure?

bluhm



Re: lo(4) checksum offload

2023-06-01 Thread Peter Stuge
David Gwynne wrote:
> > Currently packets sent over loopback interface get their checksum
> > calculated twice.  In the output path it is set and during TCP/IP
> > input it is calculated again to be compared with the previous value.
..
> > The question is, does it break corner cases?  Please test with pf
> > route-to, IPsec, bridging, IPv6 and other setups where loopback
> > might be involved.
> 
> who knows how bridge works? otherwise i think it should be ok.
> hopefully.

Would a middle ground be safer? To only calculate checksum on output?


//Peter



Re: lo(4) checksum offload

2023-05-30 Thread David Gwynne
On Tue, May 30, 2023 at 09:08:45PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> Currently packets sent over loopback interface get their checksum
> calculated twice.  In the output path it is set and during TCP/IP
> input it is calculated again to be compared with the previous value.
> 
> This can be avoided by claiming that lo(4) supports hardware checksum
> offloading.  For each packet convert the flag that the checksum
> should be calculated to the flag that it has been checked successfully.

so rather than calculate it twice, don't calculate it at all?

i love it.

> In a simple test on a vmm guest I see between 30% to 60% increase
> of thoughput over lo0.
> 
> A drawback is that "tcpdump -ni lo0 -v" reports invalid checksum.
> But people are used to that with physical interfaces and hardware
> offloading.

we could export these csum flags as part of the bpf header so we can
teach tcpdump to shut up in this situation.

> The question is, does it break corner cases?  Please test with pf
> route-to, IPsec, bridging, IPv6 and other setups where loopback
> might be involved.

who knows how bridge works? otherwise i think it should be ok.
hopefully.

> 
> bluhm
> 
> Index: net/if.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> retrieving revision 1.697
> diff -u -p -r1.697 if.c
> --- net/if.c  16 May 2023 14:32:54 -  1.697
> +++ net/if.c  30 May 2023 12:33:42 -
> @@ -778,7 +778,7 @@ if_input(struct ifnet *ifp, struct mbuf_
>  int
>  if_input_local(struct ifnet *ifp, struct mbuf *m, sa_family_t af)
>  {
> - int keepflags;
> + int keepflags, keepcksum;
>  
>  #if NBPFILTER > 0
>   /*
> @@ -796,11 +796,26 @@ if_input_local(struct ifnet *ifp, struct
>   }
>  #endif
>   keepflags = m->m_flags & (M_BCAST|M_MCAST);
> + /*
> +  * Preserve outgoing checksum flags, in case the packet is
> +  * forwarded to another interface.  Then the checksum, which
> +  * is now incorrect, will be calculated before sending.
> +  */
> + keepcksum = m->m_pkthdr.csum_flags & (M_IPV4_CSUM_OUT |
> + M_TCP_CSUM_OUT | M_UDP_CSUM_OUT | M_ICMP_CSUM_OUT);
>   m_resethdr(m);
>   m->m_flags |= M_LOOP | keepflags;
> + m->m_pkthdr.csum_flags = keepcksum;
>   m->m_pkthdr.ph_ifidx = ifp->if_index;
>   m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
>  
> + if (ISSET(keepcksum, M_TCP_CSUM_OUT))
> + m->m_pkthdr.csum_flags |= M_TCP_CSUM_IN_OK;
> + if (ISSET(keepcksum, M_UDP_CSUM_OUT))
> + m->m_pkthdr.csum_flags |= M_UDP_CSUM_IN_OK;
> + if (ISSET(keepcksum, M_ICMP_CSUM_OUT))
> + m->m_pkthdr.csum_flags |= M_ICMP_CSUM_IN_OK;
> +
>   ifp->if_opackets++;
>   ifp->if_obytes += m->m_pkthdr.len;
>  
> @@ -809,6 +824,8 @@ if_input_local(struct ifnet *ifp, struct
>  
>   switch (af) {
>   case AF_INET:
> + if (ISSET(keepcksum, M_IPV4_CSUM_OUT))
> + m->m_pkthdr.csum_flags |= M_IPV4_CSUM_IN_OK;
>   ipv4_input(ifp, m);
>   break;
>  #ifdef INET6
> Index: net/if_loop.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_loop.c,v
> retrieving revision 1.93
> diff -u -p -r1.93 if_loop.c
> --- net/if_loop.c 21 Oct 2022 14:20:03 -  1.93
> +++ net/if_loop.c 30 May 2023 11:48:55 -
> @@ -173,6 +173,9 @@ loop_clone_create(struct if_clone *ifc, 
>   ifp->if_mtu = LOMTU;
>   ifp->if_flags = IFF_LOOPBACK | IFF_MULTICAST;
>   ifp->if_xflags = IFXF_CLONED;
> + ifp->if_capabilities = IFCAP_CSUM_IPv4 |
> + IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4 |
> + IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6;
>   ifp->if_rtrequest = lortrequest;
>   ifp->if_ioctl = loioctl;
>   ifp->if_input = loinput;
>