Thanks once again for replying
I think the differences are indeed down to bad implementations rather
than specification.
Although it is acknowledged in RFC5461 section 4.1 as late as 2009 that
there are non-compliant implementations in the field where TCP does
react to soft errors. Some of the kit I work on is a lot older than that.
Also my operational experience of ARP says that dead gateway protection
is not widely supported, and black holing was more the norm than the
exception.
Erik Nordmark wrote:
I don't understand why you think all the nodes on a link need to be
coordinated in such a way; the Internet protocols are designed to be
robust and not assume that all the nodes have the same code and
tuning. For instance, we don't require that ARP or TCP on all the
nodes on a link to have the same timer values, and things work just fine.
Erik
And if all nodes on the link aren't behaving the same way, don't you
still get say 50% of the multicasts as the partner nodes revert to the
"-" state by timing out "too fast" for that link type?
Just seems like another reason to have this as a "per link" parameter
rather than a "per node" parameter.
Best regards,
RayH
The last point was simply one from an operational perspective. Forgive
me for being such a low level guy.
[side track]
One of my grumbles about IPv6 is that network managers just don't have
the standard/generic tools to be able to tune the behavior of end nodes
effectively. There are quite a lot of host behaviors that are set with
local preferences and have default values, but which are not coordinated
across implementations. e.g. dare I mention SLAAC v. DHCPv6.
As a network admin that's just a nightmare to manage in an environment
where there are multiple operating systems, guest end nodes, traveling
users, new nodes, old implementations...... half of the implementations
are performing in a way that isn't suitable for your network, but you
might not have admin rights on that end node, and there's no way to
provide the end node with a hint of correct behavior.
Think of a network using "Bring Your Own Device" policy where you do not
have any admin control e.g. no Active Directory.
There's seems to be no (effective) way of network equipment being able
to signal to end nodes what is appropriate behavior for your particular
network, compared to the simple existing tools like DHCPv4 options +
extensions we are already have today. I'm sure certain SLAAC evangelists
will tell me it's no business of mine to try to manage this at all, and
self-configuration is the future. But never mind.
[/side track]
I've just read the RFC covering the (very interesting) mesh under /
route over mechanism used in 6LoWPAN
http://tools.ietf.org/html/draft-ietf-6lowpan-nd-16 . Very cool stuff.
Even there it was a requirement that all nodes taking part in the
network behave the same way.
> The applicability of this specification is limited to LoWPANs where
all nodes on the subnet implement these optimizations in a homogeneous way.
So if the point of this draft is really to limit multicast, then from an
operational perspective don't you want ALL nodes on a link to avoid
using multicast as much as possible?
So if the point of this draft is really to avoid operational problems
with STP thrashing, then from an operational perspective don't you want
ALL nodes on a link to avoid timing out too fast as much as possible?
And how do the end nodes know what is appropriate operational behavior
on this particular link? Out of scope of the draft ........ ?
If that's really true that the end nodes do not have to behave the same
way then I do not understand why the Reachable Time and the Retransmit
Timer are sent in an RA message.
Put it the other way around way: I don't understand then why it was
considered so important that all nodes used the same values for
Reachable Time and Retransmit Timer (for NUD), if it now isn't
considered important that they even use the same retry mechanism in the
probe state, or for how long that state can last.
That's all I'm saying. If you perform link-level coordination for one
set of parameters used by NUD, why not this particular one?
Also for debugging, it's just one more thing to look at on that sniffer
trace when spending a weekend / evening debugging in a data centre (not
my favorite hobby and something I try to avoid). So is a node not
responding because it is using exponential NUD back off, or is it not
responding because a ND message is being dropped due to spanning tree
thrashing around, or is it not responding because the end node
implementation is plain broken?
Hope this helps clarify where I'm coming from. It's not in any way a
criticism of your draft, just a potential pointer to how it could be
"improved" from the perspective of someone operational.
regards,
RayH
--------------------------------------------------------------------
IETF IPv6 working group mailing list
[email protected]
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------