> IPMP depends on the probe target responding to pings in a consistent way. > If the probe target responds selectively to some probes, but not others, > then the probe based failure detection may not work.
But in most cases, we have no control over how the other hosts on the network choose to respond to ICMP probes. So attempting to make ICMP replies more reliable on Solaris just for in.mpathd's benefit seems odd -- especially to handle such a corner case failure mode. > As an example consider the degenerate case above. We have interfaces A, > B in an IPMP group on say host H1, and interfaces C, D in another IPMP > group on the probe target machine say H2. Let us say there is a > transmit path failure on C, at time T. Now until the failure detection > happens at H2, IP may use either C or D to send out the ping > response. Depending on the ping source address, IP load spreads and > send out the ping response on some interface. (by creating a > destination based ire cache). So it is possible that the response to > pings originating from A (to both C and D) go out on C. Similarly > response to pings originating from B may go out on D. With a transmit > failure on C, A stops seeing responses altogether, while B still sees > responses from both C and D. At time T + 10, H1 would misdiagnose that > A has failed. But once C has been marked failed (which will happen momentarily), probes sent from A will again be received and in.mpathd will mark the interface as repaired again -- so this is at most a transient outage, right? Maybe I'm missing something here, but the scenario you describe above seems to be exactly why we do not recommend using a single host as a probe target (even if that host has multiple IP addresses) -- if that host becomes unresponsive (e.g. due to a reboot), in.mpathd will incorrectly conclude that the interface probing that host has failed. Here, you seem to be arguing that just because the host being probed is using IPMP, it should be held to a higher standard of reliability when one of its interfaces fails -- even though it has not yet detected the failure. Further, we seem to be doing all of this to handle the unlikely case where one host catches the "leading edge" of a transmit-only failure of an IPMP-grouped interface before the host that has the interface has detected it. Can we even test this case? To be clear: this code is not in my way -- in fact it's easier to support in the new model. But so far, I find it hard to justify, especially since it should never happen in a properly-provisioned IPMP environment (e.g., one where there are multiple hosts to probe). -- meem
