Philip Homburg wrote:
In your letter dated Mon, 23 May 2011 23:10:09 +0200 you wrote:
Who says that NUD can't also be used to declare an interface down/
detect router neighbor loss?
Maybe think of a BGP process running over TCP receiving ICMP
unreachables because the local NUD has declared the neighbor
unreachable. Meanwhile the other BGP partner router is still retrying at
TCP layer because NUD has not timed out on that node. Or am I seeing
non-existent links here?
Let's say router A declares router B unreachable because of some ND problem.
Meanwhile router B still considers router A reachable.
Now obviously, router A (and the routing system) will try to avoid routing
packets from A to B because that link is down.
B still assumes that A is reachable so it will continue to forward packets to
A. As long a A does not drop those packets, everything will be fine. I don't
think there is a reason to drop incoming packets when a neighbor on a link is
unreachable, but if an implementation does that, then that will break the
independence and will cause problems.
But for relatively stable links consisting of just BGP peers, it may make more
sense to just hardwire the ND entries and disable ND.
You are probably 100% correct for BGP. And I'm reasonably convinced BGP
is relatively bullet-proof (even if one well-known network provider that
I know does not give BGP TCP sessions priority over normal user traffic,
so BGP neighbors can flap on high user load.) It was just one example
off the top of my head of a dependency.
The only point I'm making is that there may well be stuff out there that
has become reliant on the current symmetrical behavior of NUD. ARP
caches generally took a very long time before they started sending back
an ICMP unreachable. AFAIK NUD is pretty instantaneous once it declares
something unreachable.
It was after all a selling-point of NDP and neighbor discovery in
general that it "fixed" the half-open link problem.
RFC 2461 "Unlike ARP, Neighbor Discovery detects half-link failures"
And now it mightn't live up to that promise: at least in transitory
situations that might last for an extended period of comparable duration
to timeouts in other protocol layers / applications, unless the timers/
retry counts are synchronized across all nodes on the link. Timers and
race conditions can be very tricky to catch/ debug, as I'm sure you know.
Just raising a flag. Maybe this is significant, maybe not. Just thought
I'd ask.
regards,
RayH
--------------------------------------------------------------------
IETF IPv6 working group mailing list
[email protected]
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------