> > Discussing this issue with Erik this morning, we came up with a proposal > > that has similar effect but with a lot less risk: in.mpathd could simply > > ignore requests to delete targets from an interface associated with a > > failed group, and continue to probe the existing target set (possibly > > expanded target set if new targets are added). When an interface in the > > group repairs, it could then rebuild the target list based on the latest > > routing table. I prototyped this (literally a one line change) and it > > "seems" to work. > > > > Thoughts? > > That sounds like it should work. Is there potentially a race condition > between dhcpagent removing the route and in.mpathd realizing that the > interface has failed and therefore should not remove the next hop from > the target list?
No, since in.mpathd is responsible for triggering the group failure by setting the IFF_FAILED flag on the last usable interface in the group. Setting IFF_FAILED will in turn cause the kernel to clear IFF_RUNNING on the IPMP IP interface, which will cause dhcpagent/routing daemons to remove the routes -- but by that time, in.mpathd is well aware the group has failed. -- meem
