During testing, Xiang hit an interesting race.  In particular, while
testing out the DHCP test address logic, occasionally the DHCP client
fails with:

  open_ip_lif: bge1: cannot bring up IPMP group interface: Invalid argument
  unable to open socket for bge1: Invalid argument

Specifically, to bring the IPMP interface up, the DHCP client calls
SIOCGLIFFLAGS to load the current flags, then immediately sets IFF_UP and
issues an SIOCSLIFFLAGS.  However, suppose that between the SIOCGLIFFLAGS
and the SIOCSLIFFLAGS, one of the flags in IFF_IPMP_CANTCHANGE changes (in
this case, IFF_FAILED).  Then, from the point of view of the kernel, it
appears that dhcpagent is attempting to clear IFF_FAILED, so it fails the
SIOCSLIFFLAGS with EINVAL.

Note that this can't happen with the (existing) list of IFF_CANTCHANGE
flags, because the kernel simply ignores those, rather than returning an
error.  I decided to return EINVAL with IFF_IPMP_CANTCHANGE in the
interest of making it clear to the application that its request was
invalid, but it seems that's easier said than done.

I see two non-hacky possibilities:

        1. We change IFF_IPMP_CANTCHANGE to work like IFF_CANTCHANGE.
           That is, the kernel silently strips off any flags in
           IFF_IPMP_CANTCHANGE and processes the rest of the request.
           This will fix the problem, but is a bit confusing since
           e.g. a request to set IFF_STANDBY on the IPMP interface
           will appear to succeed but in fact do nothing.

        2. We split IFF_IPMP_CANTCHANGE into IFF_IPMP_CANTCHANGE and
           IFF_IPMP_INVALID.  The first set (IFF_RUNNING and IFF_FAILED)
           act like IFF_CANTCHANGE, in that they are silently ignored.
           The second set (all the other flags, which can *never* be
           set on an IPMP interface) explicitly return EINVAL.

Thoughts?  Other alternatives?

-- 
meem

Reply via email to