On Jan 11, 2011, at 10:14 AM, Shane Amante wrote: > What causes pain and/or worry to us operators is when someone launches a > large *individual* "macro-flow"[1] at the network that start to represent a > decent fraction of the overall capacity of a physical component-link > underlying a LAG and/or ECMP path, (e.g.: and the following are only an > *example*: 200 Mbps, 500 Mbps, 1 Gbps, etc. on a 10 Gbps link). > Unfortunately, due to the fact that load-hashing algorithms are stateless > (and, thus, non-adaptive), that means that "well-behaved" microflows (think: > casual Web surfing, e-mail, etc.) are still co-mingled with those large, > fat-flows across all component-links in a common LAG/ECMP path *without* > taking into account BW utilization of individual component-links. So, there > is a much higher probability (or, oftentimes, certainty) of congestion and > packet loss causing pain to all users on the one (or, more) component-links > with fat, macro-flows on them that I can't capacity plan for and I can't > easily react to. > > -shane > > [1] Examples are: IPvX in IPvX tunneling, GRE, IPSec, "WAN acceleration" type > products that are used for extremely fast large file xfers, etc.
Yes. And there are similar issues in data centers, where load balancing is also used but in a different way. There is another way to make a video flow be a relatively large chunk of a link; go closer to the access. "why should I care about a 5/10/20 MBPS...on a 10 GBPS"? You shouldn't. But, how about a 20 MBPS data flow for each access customer when you have sized for an aggregate of 25 MBPS per customer? That can be quite a bit different. For the record, when we were developing our telepresence product, we had to look at the pacing of traffic coming out of the camera/codec, because we found that a 5 MBPS data flow with 10 MBPS peaks could momentarily swamp a 100 MBPS link. Obvious enough to me (hint: peak on a short timescale != average on a long timescale), but very counterintuitive to my colleagues. There are ways to make stateless hashes change the way they hash. If you're using a CRC as a hash generator, for example, it starts with an initial value placed in a register. Change the initial value, and all the hashes change. If you find the distribution not to your liking, change the value, and see if you like that any better. The sad part is that there is no easy way to "calculate an initial value I will like"; you have to try them all, or at least occasionally try a different one. Finding a predictable method tends to be about - as Ipsilon proposed a decade plus ago - identifying the important data flows and doing something intelligent with them, and running the rest statistically. http://www.ietf.org/rfc/rfc2098.txt 2098 Toshiba's Router Architecture Extensions for ATM : Overview. Y. Katsube, K. Nagami, H. Esaki. February 1997. (Format: TXT=43622 bytes) (Status: INFORMATIONAL) http://www.ietf.org/rfc/rfc2129.txt 2129 Toshiba's Flow Attribute Notification Protocol (FANP) Specification. K. Nagami, Y. Katsube, Y. Shobatake, A. Mogi, S. Matsuzawa, T. Jinmei, H. Esaki. April 1997. (Format: TXT=41137 bytes) (Status: INFORMATIONAL) -------------------------------------------------------------------- IETF IPv6 working group mailing list [email protected] Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6 --------------------------------------------------------------------
