I happened upon an interesting and unfortunate interaction today that's
worthy of some discussion.  Specifically, when the IFF_RUNNING flag is
cleared on an IP interface, dhcpagent purges any routes it added over the
interface on the grounds that the routes can no longer be used, thus
allowing any overlapping (but still usable) routes to be used.

However, in the common case with e.g. two interfaces together in an IPMP
group that has DHCP data addresses, when the group fails, the IPMP IP
interface's IFF_RUNNING flag will be cleared and thus its routes removed
by dhcpagent.  At that point, if probe-based failure detection is enabled,
in.mpathd will fallback to multicast targets.  For sites configured not to
answer in.mpathd's multicast probes, this means the interface will *never*
repair.  For sites where the multicast probes will be answered but by
nodes that are not representative of overall connectivity, this will lead
to a spurious repair, followed by a subsequent failure when the routes are
restored but the routers still prove to be unreachable.  Neither behavior
seems acceptable.

Thoughts?  Clearly, we could remove the code in the DHCP client that
removes the routes (at least when IPMP is in use), but it makes some sense
as-is.  Further, Jim mentioned that routing daemons also do this, though I
didn't see anything that did this in ON's in.routed or in SFW's quagga
source.  Another alternative would be to only remove the routes if there
is in fact an overlapping route, but that may be non-trivial to implement.
We could also document that of explicit "host routes" need to be used with
probe-based failure detection when when DHCP (or dynamic routing?) is in
use, but that may not go far enough.

-- 
meem

Reply via email to