Ran, Jari, All,
To emphasize some of the excellent points that Ran made ...
On Jun 21, 2011, at 9:00 AM, RJ Atkinson wrote:
> Separately, packet re-ordering can (and routinely does) happen
> in the deployed world already, regardless of contents of the
> Flow Label field. So receiving nodes already have to be able
> to cope with reordered packets.
An example which routinely happens in today's networks would be link
restoration. In that case, the network is restoring traffic from a much longer
path to a shorter, more optimal path in the network. Depending on the
transmission rate of the transmitter, this can/will lead to temporary
reordering of microflows at the receivers. Do note that in well-operated
networks this reordering is, hopefully, transient and of an extremely short
duration.
> There is a 4th implementation option, which is to use only fields
> in the base IPv6 header for all cases where a router adds a Flow Label.
> That implementation option adds little or no value for load-balancing
> as compared with a zero Flow Label, because existing load-balancing
> algorithms in the deployed world already tend to use the variable
> fields of the IPv6 header (e.g. source IPv6 address, destination IPv6
> address, maybe ToS byte) for deployed load-balancing situations.
> For non-fragmented packets, the deployed world already tends to use
> the 5 input values that these I-Ds discuss, by the way.
I think it's important to re-emphasize and expound upon the above. Namely,
deployed IPv6 routers *already* [should] identify fragmented vs. non-fragmented
packets, presumably by inspecting the "Next Header field of the last header of
the Unfragmentable Part" for value 44 [RFC 2460, Section 4.5] for 'Fragment
Header' for the purposes of deciding whether /or not/ they should attempt to
identify Next Headers containing the Upper Layer protocol and, subsequently,
the {protocol, src_port & dst_port} that will be fed, (along with {src_ip,
dst_ip}), into LAG and/or ECMP hash algorithms for fine-grained load-balancing.
"Conservative" router/switch implementations strive to reduce the risk of
_persistent_ reordering of an individual microflow. IOW, since non-first
fragments will not contain Upper Layer protocol information, (specifically:
{src_port, dst_port}), that can be fed as input-keys to LAG and/or ECMP hash
algorithms, the "safe" thing they should do is to only use the 2-tuple of
{src_ip, dst_ip} as input-keys for _all_ fragments within a microflow.
Obviously, this leads to 'coarse-grained' load-balancing for microflows
containing fragmented packets.
As with many things, Engineering is about properly managing a series of
trade-offs. Currently, the advantage of avoiding persistent reordering of
fragmented microflows out-weighs the disadvantage of only being able to perform
coarse-grained load-balancing of the assumed very small amount of fragmented
microflows. If this draft is widely implemented & deployed and originating
hosts are encoding a "uniformly distributed", non-zero flow-label in all
packets (fragmented or not), then it would seem logical that routers would be
adapted so that:
a) If they encounter a Fragment Header they use: {src_ip, dst_ip + flow_label}
as input-keys to the LAG + ECMP hash algorithms; and/or,
b) If they encounter a Next Header with, for example, an Upper Layer Protocol
that they have *not* (yet?) implemented a parsing routine to extract
appropriate input-keys (or, can't, because it's too deep in the packet's
headers), then they revert back to using {src_ip, dst_ip + flow_label} as
input-keys to the LAG + ECMP hash algorithms[1]; and/or,
c) [Assuming widespread use of the flow-label], they no longer even bother
looking at any Next Headers in all packets and _always_ use {src_ip, dst_ip +
flow_label} for input-keys to LAG + ECMP hash algorithms.
Personally, I see (a) & (b) as being a short- to medium-term "wins" that could
be safely implemented, by default, in the next-spin of NP, FPGA SW and ASIC HW,
given the existence of this, hopefully soon, RFC. Obviously, (c) is going to
be a little further out. I assume that, similar to today's router
implementations, router vendors will likely provide the flow-label as yet
another input-key that may be used as input-keys for LAG + ECMP hash
algorithms. It will then be up to individual operators to determine the
appropriate time to configure their routers/switches to, for example, only use:
{src_ip, dst_ip + flow_label} when they are comfortable doing so for all
traffic.
> IMHO, the vast majority of the benefit to using the IPv6 Flow Label
> for load-balancing accrues to those IPv6 packets that have been
> fragmented where the originating node inserts the non-zero Flow Label
> value based on the documented 5 input parameters.
+1 in the short- to medium (?) term. I would also point out a substantial
additional advantage is [long-term] architectural flexibility in that the
end-points (hosts) may freely use *new* transport protocols (SCTP, DCCP,
UDP-lite, etc.) so long as they continue to label all packets with a "uniformly
distributed", non-zero flow-label so that [Core] routers/switches have
something they can safely use as input-keys for LAG and/or ECMP hash
algorithms. At least, that's one part of the network that we don't need to
worry about upgrading to support new transport-layer protocols. Unfortunately,
middleboxes (FW's or, more generally, "security GW's") might still have to be
adapted depending on the applicability of the new transport-layer protocol to
various network types, (e.g.: SOHO vs. Large-ish Enterprise).
Thanks,
-shane
[1] One example I can think of here is UDP-lite. Silly as though it may seem,
(since the format of UDP and UDP-lite headers are nearly identical), parsing
routines to extract {src_port, dst_port} from UDP-lite headers are not [widely]
implemented in deployed equipment, today, because it is assumed this isn't a
widely used transport-layer protocol. Depending on the implementation (ASIC,
NP or FPGA), they might be adapted to recognize UDP-lite, but that's a lot of
cost & work ... *just* for one additional transport-layer protocol!
--------------------------------------------------------------------
IETF IPv6 working group mailing list
[email protected]
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------