I happened upon an interesting and unfortunate interaction today that's worthy of some discussion. Specifically, when the IFF_RUNNING flag is cleared on an IP interface, dhcpagent purges any routes it added over the interface on the grounds that the routes can no longer be used, thus allowing any overlapping (but still usable) routes to be used.
However, in the common case with e.g. two interfaces together in an IPMP group that has DHCP data addresses, when the group fails, the IPMP IP interface's IFF_RUNNING flag will be cleared and thus its routes removed by dhcpagent. At that point, if probe-based failure detection is enabled, in.mpathd will fallback to multicast targets. For sites configured not to answer in.mpathd's multicast probes, this means the interface will *never* repair. For sites where the multicast probes will be answered but by nodes that are not representative of overall connectivity, this will lead to a spurious repair, followed by a subsequent failure when the routes are restored but the routers still prove to be unreachable. Neither behavior seems acceptable. Thoughts? Clearly, we could remove the code in the DHCP client that removes the routes (at least when IPMP is in use), but it makes some sense as-is. Further, Jim mentioned that routing daemons also do this, though I didn't see anything that did this in ON's in.routed or in SFW's quagga source. Another alternative would be to only remove the routes if there is in fact an overlapping route, but that may be non-trivial to implement. We could also document that of explicit "host routes" need to be used with probe-based failure detection when when DHCP (or dynamic routing?) is in use, but that may not go far enough. -- meem
