And sorry for the email spam but here's a test I did earlier with the RCU patch
when I first finished it. I sent the email to Alex, but unfortunately not to
the rest of the list. This may give some insight into the case where this patch
is the most helpful
----
So I performance tested this and most cases were about the same performance.
However, I saw a significant performance increase while doing the following
test:
- Establish 100 netperf TCP_CRR connections to sink (as you do in your start
script)
- Sleep for 20 (to let connections get established and OVS to get started)
- Run the following commands in an indefinite loop:
ovs-vsctl add-br br0 -- add-br br1
ovs-vsctl set bridge br1 datapath-type=dummy \
other-config:hwaddr=aa:55:aa:56:00:00 -- \
add-port br1 p11 -- set Interface p11 type=patch \
options:peer=p00 -- \
add-port br0 p00 -- set Interface p00 type=patch \
options:peer=p11 --
ovs-vsctl set Interface p00 bfd:enable=true -- \
set Interface p11 bfd:enable=true
sleep 1
ovs-vsctl del-br br0 -- del-br br1
Notice how I don't chain the commands together, this is because if I do so, the
new config is batched in 1 message to the ofproto-dpif-xlate layer, meaning
only 1 global xlate_rwlock worth of delay in master. So if I batch commands
together, there is no significant performance hit.
However, when I don't chain commands together (i.e. I use 3 separate ovs-vsctl
commands), then in master this means 3 separate messages between ofproto and
ofproto-dpif-xlate, meaning 3 lockings of global xlate_rwlock. This can add up
to a bunch of delay! Hence this is where we see the real improvement for RCU.
Some numbers for about 10000 interim results of the netperf processes (in
trans/s):
RCU:
Mean: 84.591932
Median: 83.405000
Master:
Mean: 78.528627
Median: 70.550000
Its not huge, but if we add more ovs-vsctl commands, I'd imagine we'd see more
improvement. Not sure if this is a valid use case, but these are my findings so
far.
Ryan Wilson
Member of Technical Staff
[email protected]
3401 Hillview Avenue, Palo Alto, CA
650.427.1511 Office
916.588.7783 Mobile
On May 19, 2014, at 9:56 PM, Ryan Wilson <[email protected]> wrote:
> Sorry Gurucharan, totally forgot to answer your question!
>
> After interspersing these tests with random calls to reload the kernel
> module, it doesn't appear to affect time in any significant way.
>
> Ryan Wilson
> Member of Technical Staff
> [email protected]
> 3401 Hillview Avenue, Palo Alto, CA
> 650.427.1511 Office
> 916.588.7783 Mobile
>
> On May 19, 2014, at 9:53 PM, Ryan Wilson <[email protected]> wrote:
>
>> So I did an experiment where I added 500 and 1000 ports and then deleted 500
>> and 1000 ports with and without this patch on both machines with 8 GB and 62
>> GB memory. Weirdly enough, adding / deleting ports with the RCU patch turned
>> out to actually be faster than without. My only explanation here is taking
>> the global xlate lock is expensive and / or 500 ports wasn't enough to
>> induce memory pressure.
>>
>> Here are the numbers for the 500 port case on a 8 GB memory machine:
>> WIth RCU patch:
>> Adding ports: real 1m15.850s
>> Deleting ports: real 1m21.830s
>>
>> Without RCU patch:
>> Adding ports: real 1m28.357s
>> Deleting ports: real 1m33.277s
>>
>> Ryan Wilson
>> Member of Technical Staff
>> [email protected]
>> 3401 Hillview Avenue, Palo Alto, CA
>> 650.427.1511 Office
>> 916.588.7783 Mobile
>>
>> On May 19, 2014, at 8:56 AM, Ben Pfaff <[email protected]> wrote:
>>
>>> On Fri, May 16, 2014 at 06:59:02AM -0700, Ryan Wilson wrote:
>>>> Before, a global read-write lock protected the ofproto-dpif /
>>>> ofproto-dpif-xlate
>>>> interface. Handler and revalidator threads had to wait while configuration
>>>> was
>>>> being changed. This patch implements RCU locking which allows handlers and
>>>> revalidators to operate while configuration is being updated.
>>>>
>>>> Signed-off-by: Ryan Wilson <[email protected]>
>>>> Acked-by: Alex Wang <[email protected]>
>>>
>>> One side effect of this change that I am a bit concerned about is
>>> performance of configuration changes. In particular, it looks like
>>> removing a port requires copying the entire configuration and that
>>> removing N ports requires copying the entire configuration N times. Can
>>> you try a few experiments with configurations that have many ports,
>>> maybe 500 or 1000, and see how long it takes to remove several of them?
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=TfBS78Vw3dzttvXidhbffg%3D%3D%0A&m=Zs91K1%2FqNCTCBEK8%2FYn6ZxlWk8%2B9KnAmWsxIFslVMIM%3D%0A&s=d0a516ff3c7de6162c8224e60363e8159c1da0eabe5ede2e10d43b18858e965d
>>
>
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev