In looking at some of the problems relating to divert, bridging,
etc., it's apparent that lots of code is breaking one of the rules
for handling mbufs: that mbuf data can sometimes be read-only.

Each mbuf may be either a normal mbuf or a cluster mbuf (if the
mbuf flags contains M_EXT). Cluster mbufs point to an entire page
of memory, and this page of memory may be shared by more than one
cluster mbuf (see m_copypacket()). This effectively makes the mbuf
data read-only, because a change to one mbuf affects all of the
mbufs, not just the one you're working on. There have been (and
still are) several FreeBSD bugs because of this subtlety.

A test for an mbuf being "read-only" is:

  if ((m->m_flags & M_EXT) != 0 && MEXT_IS_REF(m))  ...

So an implicit rule for handling mbufs is that they should be
treated as read-only unless/until you either check that it's not,
and/or pullup a new (non-cluster) mbuf that covers the data area
that you're going to modify.

However, many routines that take an mbuf parameter assume that the
mbuf given to them is modifiable and proceed to write all over it.
A few examples are: ip_input(), in_arpinput(), tcp_input(),
divert_packt(), etc.

In practice, this is often not a problem because the mbuf is actually
modifiable (because there are no other references to it). But this
is just because we've been lucky. When you throw things like bridging,
dummynet, divert, and netgraph into the mix, not to mention other
site-specific hacks, then these assumptions no longer hold. At the
minimum these assumptions should be clearly commented, but that's
not even the case right now.

Routines that don't change any data, or that only do m_pullup(),
M_PREPEND(), m_adj(), etc. don't have a problem.

So I'd like to propose a mini-project to clarify and fix this problem.
Here is the propsal:

  1.  All routines that take an mbuf as an argument must not assume
      that any mbuf in the chain is modifyable, unless expclicitly
      and clearly documented (in the comment at the top of the function)
      as doing so.

  2.  For routines that don't modify data, incorporate liberal use
      of the "const" keyword to make this clear. For example, change

          struct ip *ip;
          ip = mtod(m, struct ip *);


          const struct ip *ip;
          ip = mtod(m, const struct ip *);

  3.  For any routines that do need to modify mbuf data, but don't
      assume anything about the mbuf, alter those routines to do
      an m_pullup() when necessary to make the data are they are
      working on modifiable. For example:

        struct ip *ip;

        /* Pull up IP header */
        if (m->m_len < sizeof(*ip) && !(m = m_pullup(m, sizeof(*ip))))
        ip = mtod(m, struct ip *);

        /* Make sure the IP header area is writable */
        if ((m->m_flags & M_EXT) != 0 && MEXT_IS_REF(m)) {
                /* m_pullup() *always* prepends a fresh, non-cluster mbuf */
                if ((m = m_pullup(m, sizeof(struct ip))) == 0)
                ip = mtod(m, struct ip *);

        /* Modify the header */
        ip->ip_len = 123;

The only negative is the addition of the NEW_CODE_BEING_ADDED code
in the relevant places. In practice this test will usually fail,
as most mbufs are modifiable, so there should be no noticable
slowdown. However, robustness should improve, especially when
bridging, diverting, etc.

What do people think? If this is generally agreeable I'll try to
work on putting together a patch set for review.


Archie Cobbs   *   Whistle Communications, Inc.  *

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to