Hi Thomas,
Token operator here ... :-) See below.
On Jan 11, 2011, at 06:41 MST, Thomas Narten wrote:
> Sorry to get back to basics, but I have not followed all the Flow
> Label discussions or read all the drafts. I have read
>
> draft-ietf-6man-flow-ecmp-00.txt
> draft-ietf-6man-flow-update-01.txt
>
> pretty carefully and I still don't quite understand what real problem
> we are trying to solve - and thus, whether the proposed changes
> actually help or are a no op.
>
> Is there a document that speaks to this?
>
> Question:
>
> I understand the value of ECMP type load balancing. But how much of a
> problem is it today (with IPv6) if the Flow Label is not used?
Today, it's a significant one, especially so for tunneled traffic, e.g.: IPvX
in IPv6, GRE, IPSec, LISP, etc. and WAN acceleration products for fast file
xfer's. In the future, it could significantly inhibit (even, prohibit) the
IETF from developing new protocols that either use new IPv6 Extension Headers
or Destination Options, since Core/Edge router/switch HW (primarily built
around ASIC's) cannot (easily) be adapted to recognize the granularity of
'individual flows' in those new protocols -- worst-case, existing HW cannot be
adapted and would requiring swapping it out, which is essentially a
non-starter. Even when existing core HW can be adapted, through SW changes, to
recognize the granularity of new protocols it could be upwards of a decade
before the cycle of coaxing/prodding/cajoling vendors to develop the capability
in SW through operators testing and eventually deploying the capability in
their networks.
I would also raise the "architectural purity" argument of: do you really want
millions of routers (and, L2 switches) that are really just supposed to forward
based solely on IP source address, IP destination address (and, IP Traffic
Class) *attempting* to go deeper into the packet (headers) to discern useful,
granular information that could be used as input-keys for load-balancing across
LAG and ECMP paths? Unfortunately, every device manufacturer is going to make
their own decisions on what they can (or, will) be able to look at in the IP
and Extension Headers as input-keys for a load-balancing hash, but the
end-result will be (and, is currently) a mess in terms of deployment and
operation. (As an example, some vendors can read entire IPv6 addresses from a
IPv6 header in new(er) HW, others can only read parts of the v6 address, etc.)
Finally, there is this jewel in RFC 2460:
---cut here---
With one exception, extension headers are not examined or processed
by any node along a packet's delivery path, until the packet reaches
the node (or each of the set of nodes, in the case of multicast)
identified in the Destination Address field of the IPv6 header.
[...]
The exception referred to in the preceding paragraph is the Hop-by-
Hop Options header, which carries information that must be examined
and processed by every node along a packet's delivery path, including
the source and destination nodes.
---cut here---
While I wasn't around during during the creation of RFC 2460 (so, please
correct or inform me if I'm wrong), but it seems to at least imply (if not,
mandate?) that intermediate nodes (such as routers and L2 switches) shouldn't
be trying to interpret the characteristics of the upper-layer protocols being
transported. This would make sense when viewed in light of the end-to-end
principle, but perhaps I'm taking too strict an interpretation.
> If you hash on just the 5 tuple (excluding the flow label), you get
> (I assume) the equivalent of what you have in IPv4 today. Why is that
> not good enough?
For one, HW can't glean a 5-tuple when it encounters tunneled packets. Second,
it inhibits/prohibits the IETF from developing new upper-layer protocols
(if/when the need should arise) and getting them deployed in a reasonable
timeframe. Lastly, I would like to get to a point where I can tell my
router/switch vendors to build more simple (and, thus, more cost-effective) HW,
because they don't have to keep piling complexity into their ASIC's to
recognize the various legacy and new permutations of transport-layer protocols
for input-keys for LAG and ECMP load-balancing hashes. Instead, I would ideally
be able to tell them: just use {IP src, IP dst and IPv6 flow-label} as
input-keys -- oh, and look at that ... they're all at fixed offsets in the IPv6
header and at the very beginning of the packet so the time (and, amount of
memory) for you to copy that region of the packet to extremely expensive SRAM
(packet buffer memory) has just been reduced, (reducing cycle times to process
the packet, etc.). Now, I will grant you this latter point is a bit of
wishful thinking for now, given that we'll need to be successful in getting
these drafts agreed upon and published. Then, hosts (and, 1st-hop) routers
will need to start writing useful flow-labels (and, hopefully, firewalls or
their administrators do not screw around and write zero back over the
flow-label). However, I'm of the opinion that if you don't start to make a
small change now, you will never see an improvement down the road.
> Also, splitting flows across different links would seem to have value
> primarily if you hvae a single source (or rather single src/dest pair)
> generating a *lot* of traffic/flows, i.e., so that if you split
> traffic from that source/dest pair, you see measurable load-splitting.
>
> Is this happening in practice today? Can operators please speak to
> this? And if it is a problem, is it primarily with tunneled traffic
> (where the tunnel aggregates many flows), or is it really between
> individual pairs of nodes that are sending a *lot* of traffic to each
> other? Are there examples of this?
>
> (I'm not necessarily opposed to this work going forward, but I'm not
> entirely convinced we are solving a real problem. Help me please.)
I think I've answered these questions above; however, if you still aren't
satisfied with those answers please let me know.
Lastly, I would add that LAG and ECMP have been around for several [dozen]
years and will remain with us indefinitely. IOW, even if 100 GbE were
cost-effective and deployed today, I know of several operators who will still
be using LAG or ECMP over Nx 100 GbE trunks in order to continue to carry the
traffic demand on their networks. So, in summary, this is not a problem that
is going away with 100 GbE, or beyond.
-shane
--------------------------------------------------------------------
IETF IPv6 working group mailing list
[email protected]
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------