Hello,
This is my first post here. I come here hoping that someone can give me some
help/pointers with a problem I'm having.
We are using ovswitch as part of an OpenStack Havana install on Red Hat Linux
We have a dedicated networking node, and this is quite a powerfull machine. 2x6
cores, 32 GB ram.
The machine has an 1Gb uplink to the internet. Under normal circumstances it
has no problem coping with the traffic.
I also did a test where I fired up a few instances in our openstack cloud, and
started a bittorrent client in there. I monitored the network bandwidth
consumption, and saw it go up to about 1Gb and stay there. So I can saturate
our link. During this test the CPU load on the networking node was about 1,
which is not an issue on a 12 core machine.
However, one of the VMs in our cloud got compromised. And this machine then
started to very aggressively scan the network, and initiate lots of connections
to different hosts, from different ports.
And this managed to bring our networking node to its knees. Not through
traffic, but, it seems, by overwhelming the userspace component. The result was
loss of connectivity for all other instances.
If I understand how openvswitch works correctly then packets get matched
against flows in the kernel. If no flow is matched it gets passed to userspace,
and then a flow gets created. I get the imperssion that the compromised host
behaved caused a lot of packets to miss flows.
Looking with ovs-dpctl when everything is well I see something like this:
root@lupin-neutron-r72012014-8ds1202 ~]# ovs-dpctl show
system@ovs-system:
lookups: hit:10256807 missed:241170 lost:0
flows: 32
This is shortly after a reboot. I see that most packets seem to be hit by an
existing flow. There are a few flows defined. Flow numbers sometimes increas,
up to a few hunders but never mutch.
However during the episode with the compromised hosts the readings (I don't
have a screenshot) were very different. Running ovs-dpct showed "missed" was a
lot higher than "hit", and increasing rapidly. There were thousands of flows,
and they were changing all the time.
The questions for me now are:
- How can I better tune ovswitch so that a compromised host does not bring down
our network. The instances are started by customers, and I cannot guarantee
that they all will behave. I need to assume that this will happen again.
- Is there a way to somehow contain network traffic for misbehaving hosts?
Thanks,
Krist
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss