On Wed, Jul 11, 2018 at 11:57:43AM +0100, David Woodhouse wrote:
> On Mon, 2018-07-09 at 15:08 -0700, Paul E. McKenney wrote:
> 
> > 
> > And the earlier patch was against my -rcu tree, which won't be all that
> > helpful for v4.15.  Please see below for a lightly tested backport to v4.15.
> > 
> > It should apply to all the releases of interest.  If other backports
> > are needed, please remind me of my woodhouse.v4.15.2018.07.09a tag.
> > 
> >                                                         Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 6361b81827a8f93f582124da385258fc04a38a7f
> > Author: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> > Date:   Mon Jul 9 13:47:30 2018 -0700
> > 
> >     rcu: Make need_resched() respond to urgent RCU-QS needs
> >     
> >     The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
> >     need for an RCU quiescent state from the force-quiescent-state 
> > processing
> >     within the grace-period kthread to context switches and to 
> > cond_resched().
> >     Unfortunately, such urgent needs are not communicated to need_resched(),
> >     which is sometimes used to decide when to invoke cond_resched(), for
> >     but one example, within the KVM vcpu_run() function.  As of v4.15, this
> >     can result in synchronize_sched() being delayed by up to ten seconds,
> >     which can be problematic, to say nothing of annoying.
> >     
> >     This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
> >     rcu_check_callbacks(), which is invoked from the scheduling-clock
> >     interrupt handler.  If the current task is not an idle task and is
> >     not executing in usermode, a context switch is forced, and either way,
> >     the rcu_dynticks.rcu_urgent_qs variable is set to false.  If the current
> >     task is an idle task, then RCU's dyntick-idle code will detect the
> >     quiescent state, so no further action is required.  Similarly, if the
> >     task is executing in usermode, other code in rcu_check_callbacks() and
> >     its called functions will report the corresponding quiescent state.
> >     
> >     Reported-by: David Woodhouse <dw...@infradead.org>
> >     Suggested-by: Peter Zijlstra <pet...@infradead.org>
> >     Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> >     [ paulmck: Backported to v4.15.  Probably applies elsewhere. ]
> 
> Hm, this doesn't appear to work. I'm still seeing latencies of 4-5
> seconds in my testing. In fact, even our old workaround of adding
> rcu_all_qs() into vcpu_enter_guest() didn't properly fix it AFAICT.
> 
> I'm just creating a VM with lots of CPUs, then attaching new devices to
> it to cause the VMM to open more file descriptors, until it hits a
> power of two and invokes expand_fdtable().
> 
> expand_fdtable (512) sync took 10472394964 cycles (3500000 µs).
> expand_fdtable (512) sync took 15298908072 cycles (5100000 µs).

Interesting.  (I am assuming that the guest is printing these messages,
not the host, but please let me know if my assumption is incorrect.)

Are the CPUs saturated?  If so, could you please try booting with
rcutree.kthread_prio=2?  If that prevents the messages from happening,
then I need to put some work into guaranteeing forward progress.
Otherwise, I need to figure out why the setting of rcu_urgent_qs is
being ignored.

I will assume the latter for the moment and see if I can spot the
problem.

                                                        Thanx, Paul


> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -162,8 +162,16 @@ static int expand_fdtable(struct files_struct *files, 
> unsigned int nr)
>         /* make sure all __fd_install() have seen resize_in_progress
>          * or have finished their rcu_read_lock_sched() section.
>          */
> -       if (atomic_read(&files->count) > 1)
> +       if (atomic_read(&files->count) > 1) {
> +               unsigned long sync_start, sync_end;
> +               unsigned long j_start, j_end;
> +               j_start = jiffies;
> +               sync_start = get_cycles();
>                 synchronize_sched();
> +               sync_end = get_cycles();
> +               j_end = jiffies;
> +               printk("expand_fdtable (%d) sync took %ld cycles (%ld 
> µs).\n", nr, sync_end - sync_start, jiffies_to_usecs(j_end - j_start));
> +       }
>  
>         spin_lock(&files->file_lock);
>         if (!new_fdt)


Reply via email to