On 22 Jun 2011, at 00:07 , Shane Amante wrote:
> An example which routinely happens in today's networks would be link
> restoration.
> In that case, the network is restoring traffic from a much longer path to a
> shorter, more optimal path in the network. Depending on the transmission rate
> of the transmitter, this can/will lead to temporary reordering of microflows
> at the receivers. Do note that in well-operated networks this reordering is,
> hopefully, transient and of an extremely short duration.
Note that similar to the above, having first packet in a flow
travel a slightly different path than subsequent fragments
due to ECMP/LAG differences is also predictably "transient" and
"of an extremely short duration" (to use Shane's excellent phrasing).
> Namely, deployed IPv6 routers *already* [should] identify fragmented
> vs. non-fragmented packets, presumably by inspecting the "Next Header
> field of the last header of the Unfragmentable Part" for value 44
> [RFC 2460, Section 4.5] for 'Fragment Header' for the purposes of
> deciding whether /or not/ they should attempt to identify Next Headers
> containing the Upper Layer protocol and, subsequently, the
> {protocol, src_port & dst_port} that will be fed, (along with
> {src_ip, dst_ip}), into LAG and/or ECMP hash algorithms for
> fine-grained load-balancing.
Exactly so.
Side comment: Because many users/operators require that routers implement
wire-speed Access Control Lists, and most of those ACLs
include variables such as transport-layer protocol,
and transport-layer port numbers, ASIC/FPGA-based IPv6
routers will continue to parse into the IPv6 packet
beyond the IPv6 base header *regardless* of whether
that information is needed for ECMP/LAG/load-balancing
purposes.
I have heard credible reports of an ISP that apparently deploys
hundreds of ACL rules into the exterior border routers of their AS,
and apparently that ISP deploys a smaller but still non-trivial
number into interior routers.
Of course, this also means they select their largest routers in part
upon the routers' ability to support large numbers of ACLs at full
wire-speed and also the ability to look past the occasional IPv6
extension header to examine the ICMP/TCP/UDP/etc headers behind it.
> "Conservative" router/switch implementations strive to reduce the risk of
> _persistent_ reordering of an individual microflow. IOW, since non-first
> fragments will not contain Upper Layer protocol information, (specifically:
> {src_port, dst_port}), that can be fed as input-keys to LAG and/or ECMP hash
> algorithms, the "safe" thing they should do is to only use the 2-tuple of
> {src_ip, dst_ip} as input-keys for _all_ fragments within a microflow.
> Obviously, this leads to 'coarse-grained' load-balancing for microflows
> containing fragmented packets.
Yes, and an important word above is "persistent", to echo Shane's emphasis.
As Shane observes, this implementation approach is common today. I believe
some implementations today already include the Flow Label, simply because
it is another available differentiator (albeit most commonly zero just now),
not because of this current set of specifications.
> If this draft is widely implemented & deployed and originating hosts are
> encoding a "uniformly distributed", non-zero flow-label in all packets
> (fragmented or not), then it would seem logical that routers would be
> adapted so that:
>
> a) If they encounter a Fragment Header they use: {src_ip, dst_ip +
> flow_label}
> as input-keys to the LAG + ECMP hash algorithms; and/or,
>
> b) If they encounter a Next Header with, for example, an Upper Layer Protocol
> that they have *not* (yet?) implemented a parsing routine to extract
> appropriate input-keys (or, can't, because it's too deep in the packet's
> headers), then they revert back to using {src_ip, dst_ip + flow_label}
> as input-keys to the LAG + ECMP hash algorithms[1]; and/or,
>
> c) [Assuming widespread use of the flow-label], they no longer even bother
> looking at any Next Headers in all packets and _always_ use {src_ip,
> dst_ip + flow_label} for input-keys to LAG + ECMP hash algorithms.
>
> Personally, I see (a) & (b) as being a short- to medium-term "wins"
> that could be safely implemented, by default, in the next-spin of NP,
> FPGA SW and ASIC HW, given the existence of this, hopefully soon, RFC.
Agreed. In some cases, I think (a) and (b) are already deployed.
> Obviously, (c) is going to be a little further out.
I would apply the caveat from earlier to (c).
Routers will still be looking beyond the IPv6 header *for ACL purposes*,
even if those same routers use non-zero Flow Labels for ECMP/LAG,
rather than using transport-layer information for ECMP/LAG.
> I would also point out a substantial additional advantage is [long-term]
> architectural flexibility in that the end-points (hosts) may freely use
> *new* transport protocols (SCTP, DCCP, UDP-lite, etc.) so long as they
> continue to label all packets with a "uniformly distributed",
> non-zero flow-label so that [Core] routers/switches have something
> they can safely use as input-keys for LAG and/or ECMP hash algorithms.
With respect to ECMP/LAG narrowly, I agree with the above.
> At least, that's one part of the network that we don't need to worry
> about upgrading to support new transport-layer protocols. Unfortunately,
> middleboxes (FW's or, more generally, "security GW's") might still have
> to be adapted depending on the applicability of the new transport-layer
> protocol to various network types, (e.g.: SOHO vs. Large-ish Enterprise).
Exactly so. Router-based ACLs (which are widespread -- even in
some deployed transit/backbone routers) will still need to support those
new transport-layer protocols BEFORE those new transport-layer protocols
will be practical to widely deploy.
Yours,
Ran Atkinson
PS: For implementations prior to this current set of documents, and for
routers creating Flow Labels on the fly, a useful additional input
to an ECMP/LAG function would be the "SPI" value of an ESP or AH header.
--------------------------------------------------------------------
IETF IPv6 working group mailing list
[email protected]
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------