Re: [ovs-discuss] ovswitch performance and stability problem

Chris Wed, 19 Nov 2014 03:22:39 -0800

Hello,

what version of ovs do you use? I think in Havana the default is 1.11 ?!

Since we upgraded to the latest version 2.3 we experience hugeperformance improvements especially regarding new tcp connections persecond (TCP_CRR).

Anyway why you use the logical network router from neutron? Like youtested by yourself one instance will max it out, it doesn't scale eavewith a 10 GB interface.Have a look at "Neutron flat provider network", this will connect theinstance traffic directly to the physical Layer 3 device.

Regarding the monitoring have a look at:http://openvswitch.org/support/config-cookbooks/sflow/


Cheers
Chris

On 2014-11-19 17:14, Krist van Besien wrote:

Hello,

 This is my first post here. I come here hoping that someone can give
me some help/pointers with a problem I'm having.

 We are using ovswitch as part of an OpenStack Havana install on Red
Hat Linux
 We have a dedicated networking node, and this is quite a powerfull
machine. 2x6 cores, 32 GB ram.

 The machine has an 1Gb uplink to the internet. Under normal
circumstances it has no problem coping with the traffic.

 I also did a test where I fired up a few instances in our openstack
cloud, and started a bittorrent client in there. I monitored the
network bandwidth consumption, and saw it go up to about 1Gb and stay
there. So I can saturate our link. During this test the CPU load on
the networking node was about 1, which is not an issue on a 12 core
machine.

 However, one of the VMs in our cloud got compromised. And this
machine then started to very aggressively scan the network, and
initiate lots of connections to different hosts, from different ports.

 And this managed to bring our networking node to its knees. Not
through traffic, but, it seems, by overwhelming the userspace
component. The result was loss of connectivity for all other
instances.

 If I understand how openvswitch works correctly then packets get
matched against flows in the kernel. If no flow is matched it gets
passed to userspace, and then a flow gets created. I get the
imperssion that the compromised host behaved caused a lot of packets
to miss flows.

 Looking with ovs-dpctl when everything is well I see something like
this:

 root@lupin-neutron-r72012014-8ds1202 ~]# ovs-dpctl show
 system@ovs-system:
 lookups: hit:10256807 missed:241170 lost:0
 flows: 32

 This is shortly after a reboot. I see that most packets seem to be
hit by an existing flow. There are a few flows defined. Flow numbers
sometimes increas, up to a few hunders but never mutch.

 However during the episode with the compromised hosts the readings (I
don't have a screenshot) were very different. Running ovs-dpct showed
"missed" was a lot higher than "hit", and increasing rapidly. There
were thousands of flows, and they were changing all the time.

 The questions for me now are:
 - How can I better tune ovswitch so that a compromised host does not
bring down our network. The instances are started by customers, and I
cannot guarantee that they all will behave. I need to assume that this
will happen again.
 - Is there a way to somehow contain network traffic for misbehaving
hosts?

 Thanks,

 Krist


_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] ovswitch performance and stability problem

Reply via email to