Hi All, I have a couple of questions concerning the scalability of OVS when used in high-density virtualisation environments. To elaborate futher, a bit of background.
Most virtualisation implementations do not go far enough in reality to abstract the virtual from the physical network environments. As an example, take an environment using Xen/XenServer, with up to a few hundred physical nodes, each with up to 50 or 60 virtual machines. If you assume that each VM has a single VIF, which uses, of course, a MAC address. Let¹s say for argument¹s sake that we have 200 physical nodes, each with 50 VMs. That makes a total of 10,000 MAC addresses for the VMs, plus 200 MAC addresses for the physical nodes, any X-number of MAC addresses if you are using any kind of network attached storage. Given that the majority of physical network implementations place the physical nodes connected to physical access switches in the datacenter, which in turn feed into larger distribution switches, there is a risk with CAM table overflow. For example, some hosting providers use relatively inexpensive switches at the access layer, and some manufacturers have a CAM limit of anywhere between 6000 and 16000 MAC addresses. In the case of CAM overflow, the switch has no choice but to flood, essentially making it behave like a hub, with the inherent network performance degradation that entails. Admittedly, on way of overcoming this is to extend the routing protocol domain down to the individual nodes. In this way the physical switched environment sees only the MAC addresses of the nodes and any associated storage, and not the individual VMs. (An example would be a node would advertise its connected VMs to the distribution router via OSPF, and will accept only a default route received from the distribution so the individual node kernel routing table would be quite manageable). However, given that OVS is not a layer3 device, what is the scalability limit with OVS for such a high-density environment? Next imagine that some of the VMs have multiple VIFs so the problem is exacerbated. Another way to alleviate this could be through the use of TRILL in combination of RBRIDGE wondering if there are any plans to implement this in OVS at some point in the future. I am, nevertheless, interested in the Ethernet over GRE implementation in OVS as well, as this facilitates the creation of private networks between VMs, across different nodes. Again, however, I question the scalability of this approach, especially in such a high density environment. While I understand that theoretically one could create a few ³thousand² tunnels between the various nodes, each carrying the private LANs for the different VMs. How well does this approach actually scale? Imagine, for example, a customer with 40 VMs scattered across different nodes. These 40 VMs all need to ³talk² to each other via a ³backend² private network. How would this be setup, whilst still avoiding potential layer-2 loops, or avoiding any one node becoming a ³bottleneck² hub to multiple tunnel endpoints? For this one, I¹m really interested since 802.1Q is not really feasible due to its limitation to 4096 VLANs (and certain equipment manufacturers imposing even further limitations, such as Cisco¹s 1005 VLANs on many switching platforms) -- Q-in-Q isn¹t really a viable option as it overcomplicates matters. So I guess in summary, my questions are (to cut a long story short): * What is the scalability of a large OVS deployment and what abstraction measures are taken to avoid the ³virtual² domain having CAM exhaustion risks on the ³physical² infrastructure? * What is the scalability of the Ethernet over GRE implementation and what measures are taken to avoid loops and/or bottlenecks? * Are there any plans potentially to incorporate TRILL and/or RBRIDGE features into OVS at some point in the future? Many thanks for any ideas, comments, suggestions, answers, and cheeseburgers :) Leland
_______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
