Hi all,

I think I have an explanation and a fix...

On Tue, Jan 11, 2022 at 09:59:45PM +0100, Paul de Weerd wrote:
> More debugging later, it turns out that I can now pretty reliably
> panic the GENERIC.MP kernel with a WireGuard tunnel that has an MTU of
> 1500 bytes, sending over the iwx(4) in my machine.
> 
> Christian Ehrhardt (over at Genua) provided a diff that printf's when
> there's a read-only mbuf passed to m_pullup:
> 
> Index: uipc_mbuf.c
> ===================================================================
> RCS file: /home/OpenBSD/cvs/src/sys/kern/uipc_mbuf.c,v
> retrieving revision 1.279
> diff -u -p -r1.279 uipc_mbuf.c
> --- uipc_mbuf.c       6 Mar 2021 09:20:49 -0000       1.279
> +++ uipc_mbuf.c       11 Jan 2022 20:51:45 -0000
> @@ -937,6 +937,10 @@ m_pullup(struct mbuf *m0, int len)
>       if (m == NULL)
>               goto freem0;
>  
> +     if (M_READONLY(m))
> +             printf("BUG: Calling %s() on read-only mbuf cluster\n",
> +                 __func__);
> +
>       head = M_DATABUF(m0);
>       if (m0->m_len == 0) {
>               m0->m_data = head;
> 
> This gets triggered A LOT when doing transfers over the wg(4)
> interface.  The panic doesn't happen every time this condition is met
> though, but seems easier to trigger with multiple parallel transfers.
> Lowering the MTU of wg0 to 1400, the printf() is no longer called, and
> the kernel no longer panics.
> 
> I changed Christian's diff to enter ddb (by calling assert(FALSE))
> after the printf.  The resulting trace showed:
> 
> panic() at panic+0xbf
> __assert() at __assert+0x25
> m_pullup() at m_pullup+0x304
> wg_input() at wg_input+0x17a
> udp_sbappend() at udp_sbappend+0x78
> ...

So here is what happens:
- The iwx(4) (and the iwm(4) driver for that matter) receive multiple
  IP packets in an aggregated WLAN packet. All of these packets
  live in the same mbuf cluster.
- The iwx(4) driver uses m_copym to de-aggregates the packets.
  As a result we get multiple mbufs (mbuf chains) that all reference
  different parts of the same cluster. This should be ok but
  renders the cluster memory read-only (M_READONLY() retruns true).
- Then some code in the stack appends to one of these mbuf chains.
  The most likely candidate is re-assembly of fragmented IP packets.
  This results in an mbuf chain with two mbufs:
  + The first mbuf contains the data in the read-only mbuf cluster.
  + The second mbuf can be anything but assuming IP re-assembly it is
    probably another mbuf that references a read-only mbuf cluster.
    It is even possible that the second mbuf in fact points to
    the same cluster.
- This mbuf chain reaches wg_input in wg(4) because it contains
  a wireguard encrypted packet.
- As wireguard decryption can only deal with buffers but not with
  mbufs, wg_input does this:
     m = m_pullup(m, m->m_pkthdr.len)
- Unfortunately, m_pullup does not respect the read-only
  property of an mbuf cluster and blindly copies the entire packet
  into the read-only cluster.
- This overwrites packet data of other packets that reside in the
  same cluster.
- Most of the time this corruption will simply result in a dropped
  packet and an increased error counter.
- The panic only happens if the corruption is detected
  during re-ordering: When the packet is saved for reordering
  iwx(4) checks that the packet is a QOS packet. Later when the
  packet is released from the reorder buffer we assume that the
  packet still is a QOS packet and the ieee80211 code panics if
  this is not the case.

IMHO the fix should be in m_pullup, a suggested patch is below.
Paul already tested an ealier version of this and it seems to fix
the problem.

> Does this mean there's a problem with the call to m_pullup in wg_input
> (if_wg.c:2031)?
> 
> Again, many thanks to stsp@ and Christian for their patience, diffs,
> help and suggestions.

OK, below is the diff. It is slightly larger than strictly
neccessary because I wanted to handle zero length m0 that points
to a read-only cluster properly.

I tried hard to follow all the special cases but the m_pullup
code is complicated and partially non-obvious. Thus this needs
more eyes and careful review.

    regards   Christian


commit 32faeffde06751c3d2686bc026df7db301fad6cc
Author: Christian Ehrhardt <[email protected]>
Date:   Tue Jan 11 10:31:46 2022 +0100

    m_pullup: Properly handle read-only clusters
    
    If the first mbuf of a chain in m_pullup is a cluster, check if
    the cluster is read-only (shared or an external buffer). If so
    don't touch it an create an new mbuf for the pullup data.

diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c
index 5e4cb5ba88..d8b9e751c6 100644
--- a/sys/kern/uipc_mbuf.c
+++ b/sys/kern/uipc_mbuf.c
@@ -957,8 +957,6 @@ m_pullup(struct mbuf *m0, int len)
 
        head = M_DATABUF(m0);
        if (m0->m_len == 0) {
-               m0->m_data = head;
-
                while (m->m_len == 0) {
                        m = m_free(m);
                        if (m == NULL)
@@ -972,25 +970,29 @@ m_pullup(struct mbuf *m0, int len)
        tail = head + M_SIZE(m0);
        head += adj;
 
-       if (len <= tail - head) {
-               /* there's enough space in the first mbuf */
-
-               if (len > tail - mtod(m0, caddr_t)) {
+       if (!M_READONLY(m0) && len <= tail - head) {
+               /* we can copy everything into the first mbuf */
+               if (m0->m_len == 0) {
+                       m0->m_data = head;
+               } else if (len > tail - mtod(m0, caddr_t)) {
                        /* need to memmove to make space at the end */
                        memmove(head, mtod(m0, caddr_t), m0->m_len);
                        m0->m_data = head;
+                       len -= m0->m_len;
                }
-
-               len -= m0->m_len;
        } else {
-               /* the first mbuf is too small so make a new one */
+               /* the first mbuf is too small or read-only, make a new one */
                space = adj + len;
 
                if (space > MAXMCLBYTES)
                        goto bad;
 
-               m0->m_next = m;
-               m = m0;
+               if (m0->m_len == 0) {
+                       m_free(m0);
+               } else {
+                       m0->m_next = m;
+                       m = m0;
+               }
 
                MGET(m0, M_DONTWAIT, m->m_type);
                if (m0 == NULL)

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to