On Jan 24, 2013, at 19:41 , ext Jesse Gross wrote:
> On Thu, Jan 24, 2013 at 7:34 AM, Jarno Rajahalme
> <[email protected]> wrote:
>>
>> On Jan 23, 2013, at 19:30 , ext Jesse Gross wrote:
>>
>>> On Tue, Jan 22, 2013 at 9:48 PM, Jarno Rajahalme
>>> <[email protected]> wrote:
>>>> Add OVS_PACKET_ATTR_KEY_INFO to relieve userspace from re-computing
>>>> data already computed within the kernel datapath. In the typical
>>>> case of an upcall with perfect key fitness between kernel and
>>>> userspace this eliminates flow_extract() and flow_hash() calls in
>>>> handle_miss_upcalls().
>>>>
>>>> Additional bookkeeping within the kernel datapath is minimal.
>>>> Kernel flow insertion also saves one flow key hash computation.
>>>>
>>>> Removed setting the packet's l7 pointer for ICMP packets, as this was
>>>> never used.
>>>>
>>>> Signed-off-by: Jarno Rajahalme <[email protected]>
>>>> ---
>>>>
>>>> This likely requires some discussion, but it took a while for me to
>>>> understand why each packet miss upcall would require flow_extract()
>>>> right after the flow key has been obtained from odp attributes.
>>>
>>> Do you have any performance numbers to share? Since this is an
>>> optimization it's important to understand if the benefit is worth the
>>> extra complexity.
>>
>> Not yet, but would be happy to. Any hits towards for the best way of
>> obtaining
>> meaningful numbers for something like this?
>
> This is a flow setup optimization, so usually something like netperf
> TCP_CRR would be a good way to stress that.
>
> However, Ben mentioned to me that he had tried eliminating the
> flow_extract() call from userspace in the past and the results were
> disappointing.
I made a simple test, where there is only one flow entry "in_port=LOCAL
actions=drop", and only the local port is configured. One process sends UDP
packets with different source/destination port combinations in a loop. OVS then
tries to cope with the load. During the test both processes run near 100% CPU
utilization in a virtual machine on a dual-core laptop. On each round 10100000
packets were generated:
OFPST_PORT reply (xid=0x2): 1 ports
port LOCAL: rx pkts=10100006, bytes=464600468, drop=0, errs=0, frame=0,
over=0, crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
With current master 19.35% of packets on average get processed by the flow:
Round 1:
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=29.124s, table=0, n_packets=1959794, n_bytes=90150548,
idle_age=4, in_port=LOCAL actions=drop
Round 2:
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=63.534s, table=0, n_packets=1932785, n_bytes=88908158,
idle_age=37, in_port=LOCAL actions=drop
Round 3:
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=33.394s, table=0, n_packets=1972389, n_bytes=90729894,
idle_age=8, in_port=LOCAL actions=drop
With the proposed change 20.2% of packets on average get processed by the flow:
Round 4:
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=31.96s, table=0, n_packets=2042759, n_bytes=93966914,
idle_age=4, in_port=LOCAL actions=drop
Round 5:
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=38.6s, table=0, n_packets=2040224, n_bytes=93850372,
idle_age=8, in_port=LOCAL actions=drop
Round 6:
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=35.661s, table=0, n_packets=2039595, n_bytes=93821418,
idle_age=3, in_port=LOCAL actions=drop
So there is a consistent benefit, but it is not very large. Seemingly the
flow_extract() and flow_hash() represent only a small portion of the OVS flow
setup CPU use.
Jarno
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev