Re: [ovs-dev] [PATCH] dpif-netdev: Remove PMD latency on seq_mutex

Daniele Di Proietto Tue, 29 Mar 2016 20:21:07 -0700


On 29/03/2016 06:44, "Karl Rister" <kris...@redhat.com> wrote:


>On 03/29/2016 08:08 AM, Flavio Leitner wrote:
>> On Tue, Mar 29, 2016 at 02:13:18AM +0000, Daniele Di Proietto wrote:
>>> Hi Flavio and Karl,
>>>
>>> thanks for the patch! I have a couple of comments:
>>>
>>> Can you point out a configuration where this is the bottleneck?
>>> I'm interested in reproducing this.
>> 
>> Karl, since you did the tests, could you please provide more details?
>
>When performing packet forwarding latency tests, I first noticed system
>and idle time when looking at CPU statistics when I expected the PMD
>threads to be 100% in userspace.  I used the kernel ftrace facility to
>track down what was happening and saw that the PMD thread was being
>context switched out and going idle.  The PMD thread was pinned to CPU
>core thread isolated with isolcpus so there was no competing task that
>could be scheduled to cause the context switch and I would not expect
>the polling thread to ever go idle.  Further analysis with frace and gdb
>tracked the cause to seq_mutex blocking when another task held the mutex.
>
>I would estimate that this change removed packet latency spikes of 35-45
>usecs in our test scenario.
>
>The test is forwarding packets through a KVM guest using OVS+DPDK in the
>host and the DPDK testpmd application in the guest.

Thanks the explanation

>
>Flavio, I thought I remembered you also saying that you saw a throughput
>improvement in a test you were running?
>
>> 
>> 
>>> I think the implementation would look simpler if we could
>>> avoid explicitly taking the mutex in dpif-netdev and instead
>>> having a ovsrcu_try_quiesce(). What do you think?
>> 
>> My concern is that it is freeing one entry from EMC each round
>> and it should quiesce to allow the callbacks to run.  If, for
>> some reason, it fails to quiesce for a long period, then it might
>> backlog a significant number of entries.
>
>My initial approach, which Flavio's code is very similar to, was simply
>trying to provide the simplest change to achieve what I was looking for.
> I could certainly see alternative solutions being more appropriate.
>
>> 
>> 
>>> I think we can avoid the recursive mutex as well if we introduce
>>> some explicit APIs in seq (seq_try_lock, seq_change_protected and
>>> seq_unlock), but I'd like to understand the performance implication
>>> of this commit first.
>
>One other area of the sequence code that I thought was curious was a
>single mutex that covered all sequences.  If updating the API is a
>possibility I would think going to a mutex per sequence might be an
>appropriate change as well.  That said, I don't have data that
>specifically points this out as a problem.

If we find this to be a bottleneck I think we can have a finer-grained
locking.

>
>> 
>> The issue is the latency spike when the PMD thread blocks on the
>> busy mutex.
>> 
>> The goal with recursive locking is to make sure we can sweep
>> the EMC cache and quiesce without blocking.  Fixing seq API
>> would help to not block, but then we have no control to whether
>> we did both tasks in the same round.
>> 
>> fbl
>> 
>> 
>>>
>>> On 23/03/2016 20:54, "dev on behalf of Flavio Leitner"
>>> <dev-boun...@openvswitch.org on behalf of f...@redhat.com> wrote:
>>>
>>>> The PMD thread needs to keep processing RX queues in order
>>>> archive maximum throughput.  However, it also needs to run
>>>> other tasks after some time processing the RX queues which
>>>> a mutex can block the PMD thread.  That causes latency
>>>> spikes and affects the throughput.
>>>>
>>>> Convert to recursive mutex so that PMD thread can test first
>>>> and if it gets the lock, continue as before, otherwise try
>>>> again another time.  There is an additional logic to make
>>>> sure the PMD thread will try harder as the attempt to get
>>>> the mutex continues to fail.
>>>>
>>>> Co-authored-by: Karl Rister <kris...@redhat.com>
>>>> Signed-off-by: Flavio Leitner <f...@redhat.com>
>>>
>>> Oh, we're going to need a signoff from Karl as well :-)
>
>Signed-off-by: Karl Rister <kris...@redhat.com>
>
>Is this good enough?

Absolutely, thanks!

>
>>>
>>> Thanks,
>>>
>>> Daniele
>>>
>>>> ---
>>>> include/openvswitch/thread.h |  3 +++
>>>> lib/dpif-netdev.c            | 33 ++++++++++++++++++++++-----------
>>>> lib/seq.c                    | 15 ++++++++++++++-
>>>> lib/seq.h                    |  3 +++
>>>> 4 files changed, 42 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/include/openvswitch/thread.h
>>>>b/include/openvswitch/thread.h
>>>> index af6f2bb..6d20720 100644
>>>> --- a/include/openvswitch/thread.h
>>>> +++ b/include/openvswitch/thread.h
>>>> @@ -44,6 +44,9 @@ struct OVS_LOCKABLE ovs_mutex {
>>>> #define OVS_ADAPTIVE_MUTEX_INITIALIZER OVS_MUTEX_INITIALIZER
>>>> #endif
>>>>
>>>> +#define OVS_RECURSIVE_MUTEX_INITIALIZER \
>>>> +   { PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP, "<unlocked>" }
>>>> +
>>>> /* ovs_mutex functions analogous to pthread_mutex_*() functions.
>>>>  *
>>>>  * Most of these functions abort the process with an error message on
>>>>any
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index 0f2385a..a10b2d1 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -2668,12 +2668,15 @@ static void *
>>>> pmd_thread_main(void *f_)
>>>> {
>>>>     struct dp_netdev_pmd_thread *pmd = f_;
>>>> -    unsigned int lc = 0;
>>>> +    unsigned int lc_max = 1024;
>>>> +    unsigned int lc_start;
>>>> +    unsigned int lc;
>>>>     struct rxq_poll *poll_list;
>>>>     unsigned int port_seq = PMD_INITIAL_SEQ;
>>>>     int poll_cnt;
>>>>     int i;
>>>>
>>>> +    lc_start = 0;
>>>>     poll_cnt = 0;
>>>>     poll_list = NULL;
>>>>
>>>> @@ -2698,24 +2701,32 @@ reload:
>>>>      * reloading the updated configuration. */
>>>>     dp_netdev_pmd_reload_done(pmd);
>>>>
>>>> +    lc = lc_start;
>>>>     for (;;) {
>>>>         for (i = 0; i < poll_cnt; i++) {
>>>>             dp_netdev_process_rxq_port(pmd, poll_list[i].port,
>>>> poll_list[i].rx);
>>>>         }
>>>>
>>>> -        if (lc++ > 1024) {
>>>> -            unsigned int seq;
>>>> +        if (lc++ > lc_max) {
>>>> +            if (!seq_pmd_trylock()) {
>>>> +                unsigned int seq;
>>>> +                lc_start = 0;
>>>> +                lc = 0;
>>>>
>>>> -            lc = 0;
>>>> +                emc_cache_slow_sweep(&pmd->flow_cache);
>>>> +                coverage_try_clear();
>>>> +                ovsrcu_quiesce();
>>>>
>>>> -            emc_cache_slow_sweep(&pmd->flow_cache);
>>>> -            coverage_try_clear();
>>>> -            ovsrcu_quiesce();
>>>> +                seq_pmd_unlock();
>>>>
>>>> -            atomic_read_relaxed(&pmd->change_seq, &seq);
>>>> -            if (seq != port_seq) {
>>>> -                port_seq = seq;
>>>> -                break;
>>>> +                atomic_read_relaxed(&pmd->change_seq, &seq);
>>>> +                if (seq != port_seq) {
>>>> +                    port_seq = seq;
>>>> +                    break;
>>>> +                }
>>>> +            } else {
>>>> +                lc_start += (lc_start + lc_max)/2;
>>>> +                lc = lc_start;
>>>>             }
>>>>         }
>>>>     }
>>>> diff --git a/lib/seq.c b/lib/seq.c
>>>> index 9c3257c..198b2ce 100644
>>>> --- a/lib/seq.c
>>>> +++ b/lib/seq.c
>>>> @@ -55,7 +55,7 @@ struct seq_thread {
>>>>     bool waiting OVS_GUARDED;        /* True if latch_wait() already
>>>> called. */
>>>> };
>>>>
>>>> -static struct ovs_mutex seq_mutex = OVS_MUTEX_INITIALIZER;
>>>> +static struct ovs_mutex seq_mutex = OVS_RECURSIVE_MUTEX_INITIALIZER;
>>>>
>>>> static uint64_t seq_next OVS_GUARDED_BY(seq_mutex) = 1;
>>>>
>>>> @@ -68,6 +68,19 @@ static void seq_thread_woke(struct seq_thread *)
>>>> OVS_REQUIRES(seq_mutex);
>>>> static void seq_waiter_destroy(struct seq_waiter *)
>>>> OVS_REQUIRES(seq_mutex);
>>>> static void seq_wake_waiters(struct seq *) OVS_REQUIRES(seq_mutex);
>>>>
>>>> +
>>>> +int seq_pmd_trylock(void)
>>>> +     OVS_TRY_LOCK(0, seq_mutex)
>>>> +{
>>>> +  return ovs_mutex_trylock(&seq_mutex);
>>>> +}
>>>> +
>>>> +void seq_pmd_unlock(void)
>>>> +    OVS_RELEASES(seq_mutex)
>>>> +{
>>>> +  ovs_mutex_unlock(&seq_mutex);
>>>> +}
>>>> +
>>>> /* Creates and returns a new 'seq' object. */
>>>> struct seq * OVS_EXCLUDED(seq_mutex)
>>>> seq_create(void)
>>>> diff --git a/lib/seq.h b/lib/seq.h
>>>> index b0ec6bf..f6b6e1a 100644
>>>> --- a/lib/seq.h
>>>> +++ b/lib/seq.h
>>>> @@ -117,6 +117,9 @@
>>>> #include <stdint.h>
>>>> #include "util.h"
>>>>
>>>> +int seq_pmd_trylock(void);
>>>> +void seq_pmd_unlock(void);
>>>> +
>>>> /* For implementation of an object with a sequence number attached. */
>>>> struct seq *seq_create(void);
>>>> void seq_destroy(struct seq *);
>>>> -- 
>>>> 2.5.0
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> http://openvswitch.org/mailman/listinfo/dev
>>>
>> 
>
>
>-- 
>Karl Rister <kris...@redhat.com>

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] dpif-netdev: Remove PMD latency on seq_mutex

Reply via email to