On Wed, Dec 27, 2017 at 09:58:08PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> > Bah, no. We need to move that into the nohz logic somehow to prevent that
> > repetitive expiry yesterday reprogramming. Lemme think about it some more.
> 
> The patch below should be the proper cure.
> 
> Thanks,
> 
>       tglx
> 
> 8<-------------------
> Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
> From: Thomas Gleixner <[email protected]>
> Date: Fri, 22 Dec 2017 15:51:13 +0100
> 
> From: Thomas Gleixner <[email protected]>
> 
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
> subsequently invokes tick_nohz_stop_sched_tick() are:
> 
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> 
> If need_resched() is not set, but a timer softirq is pending then this is
> an indication that the softirq code punted and delegated the execution to
> softirqd. need_resched() is not true because the current interrupted task
> takes precedence over softirqd.
> 
> Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
> timer interrupts because the timer wheel contains an expired timer, but
> softirqs are not yet executed. So it returns an immediate expiry request,
> which causes the timer to fire immediately again. Lather, rinse and
> repeat....
> 
> Prevent that by adding a check for a pending timer soft interrupt to the
> conditions in tick_nohz_stop_sched_tick() which avoid calling
> get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
> prevents a repetitive programming of an already expired timer.
> 
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Sebastian Siewior <[email protected]>
> Cc: [email protected]
> Cc: Paul McKenney <[email protected]>
> Cc: Anna-Maria Gleixner <[email protected]>
> 
> ---
>  kernel/time/tick-sched.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
>       ts->next_tick = 0;
>  }
>  
> +static inline bool local_timer_softirq_pending(void)
> +{
> +     return local_softirq_pending & TIMER_SOFTIRQ;
> +}
> +
>  static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>                                        ktime_t now, int cpu)
>  {
> @@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
>       } while (read_seqretry(&jiffies_lock, seq));
>       ts->last_jiffies = basejiff;
>  
> -     if (rcu_needs_cpu(basemono, &next_rcu) ||
> -         arch_needs_cpu() || irq_work_needs_cpu()) {
> +     if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
> +         irq_work_needs_cpu() || local_timer_softirq_pending()) {

Much better. This may need a comment though because it's not immediately
obvious why we have this check while softirqs are processed just before
tick_irq_exit().

Thanks.

Acked-by: Frederic Weisbecker <[email protected]>

Reply via email to