Philip Homburg wrote:
In your letter dated Mon, 23 May 2011 23:10:09 +0200 you wrote:
Who says that NUD can't also be used to declare an interface down/
detect router neighbor loss?

Maybe think of a BGP process running over TCP receiving ICMP
unreachables because the local NUD has declared the neighbor
unreachable. Meanwhile the other BGP partner router is still retrying at
TCP layer because NUD has not timed out on that node. Or am I seeing
non-existent links here?

Let's say router A declares router B unreachable because of some ND problem.
Meanwhile router B still considers router A reachable.

Now obviously, router A (and the routing system) will try to avoid routing
packets from A to B because that link is down.

B still assumes that A is reachable so it will continue to forward packets to
A. As long a A does not drop those packets, everything will be fine. I don't
think there is a reason to drop incoming packets when a neighbor on a link is
unreachable, but if an implementation does that, then that will break the
independence and will cause problems.

But for relatively stable links consisting of just BGP peers, it may make more
sense to just hardwire the ND entries and disable ND.


You are probably 100% correct for BGP. And I'm reasonably convinced BGP is relatively bullet-proof (even if one well-known network provider that I know does not give BGP TCP sessions priority over normal user traffic, so BGP neighbors can flap on high user load.) It was just one example off the top of my head of a dependency.

The only point I'm making is that there may well be stuff out there that has become reliant on the current symmetrical behavior of NUD. ARP caches generally took a very long time before they started sending back an ICMP unreachable. AFAIK NUD is pretty instantaneous once it declares something unreachable.

It was after all a selling-point of NDP and neighbor discovery in general that it "fixed" the half-open link problem.

RFC 2461 "Unlike ARP, Neighbor Discovery detects half-link failures"

And now it mightn't live up to that promise: at least in transitory situations that might last for an extended period of comparable duration to timeouts in other protocol layers / applications, unless the timers/ retry counts are synchronized across all nodes on the link. Timers and race conditions can be very tricky to catch/ debug, as I'm sure you know.

Just raising a flag. Maybe this is significant, maybe not. Just thought I'd ask.

regards,
RayH
--------------------------------------------------------------------
IETF IPv6 working group mailing list
[email protected]
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------

Reply via email to