Hello,
what version of ovs do you use? I think in Havana the default is 1.11 ?!
Since we upgraded to the latest version 2.3 we experience huge
performance improvements especially regarding new tcp connections per
second (TCP_CRR).
Anyway why you use the logical network router from neutron? Like you
tested by yourself one instance will max it out, it doesn't scale eave
with a 10 GB interface.
Have a look at "Neutron flat provider network", this will connect the
instance traffic directly to the physical Layer 3 device.
Regarding the monitoring have a look at:
http://openvswitch.org/support/config-cookbooks/sflow/
Cheers
Chris
On 2014-11-19 17:14, Krist van Besien wrote:
Hello,
This is my first post here. I come here hoping that someone can give
me some help/pointers with a problem I'm having.
We are using ovswitch as part of an OpenStack Havana install on Red
Hat Linux
We have a dedicated networking node, and this is quite a powerfull
machine. 2x6 cores, 32 GB ram.
The machine has an 1Gb uplink to the internet. Under normal
circumstances it has no problem coping with the traffic.
I also did a test where I fired up a few instances in our openstack
cloud, and started a bittorrent client in there. I monitored the
network bandwidth consumption, and saw it go up to about 1Gb and stay
there. So I can saturate our link. During this test the CPU load on
the networking node was about 1, which is not an issue on a 12 core
machine.
However, one of the VMs in our cloud got compromised. And this
machine then started to very aggressively scan the network, and
initiate lots of connections to different hosts, from different ports.
And this managed to bring our networking node to its knees. Not
through traffic, but, it seems, by overwhelming the userspace
component. The result was loss of connectivity for all other
instances.
If I understand how openvswitch works correctly then packets get
matched against flows in the kernel. If no flow is matched it gets
passed to userspace, and then a flow gets created. I get the
imperssion that the compromised host behaved caused a lot of packets
to miss flows.
Looking with ovs-dpctl when everything is well I see something like
this:
root@lupin-neutron-r72012014-8ds1202 ~]# ovs-dpctl show
system@ovs-system:
lookups: hit:10256807 missed:241170 lost:0
flows: 32
This is shortly after a reboot. I see that most packets seem to be
hit by an existing flow. There are a few flows defined. Flow numbers
sometimes increas, up to a few hunders but never mutch.
However during the episode with the compromised hosts the readings (I
don't have a screenshot) were very different. Running ovs-dpct showed
"missed" was a lot higher than "hit", and increasing rapidly. There
were thousands of flows, and they were changing all the time.
The questions for me now are:
- How can I better tune ovswitch so that a compromised host does not
bring down our network. The instances are started by customers, and I
cannot guarantee that they all will behave. I need to assume that this
will happen again.
- Is there a way to somehow contain network traffic for misbehaving
hosts?
Thanks,
Krist
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss