core: Add debug code to catch missing update_rq_clock()

Petr Mladek Wed, 21 Sep 2016 08:59:32 -0700

On Wed 2016-09-21 14:38:13, Matt Fleming wrote:
> There's no diagnostic checks for figuring out when we've accidentally
> missed update_rq_clock() calls. Let's add some by piggybacking on the
> rq_*pin_lock() wrappers.
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index bf48e7975c23..91f4b3d58d56 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> +/*
> + * rq::clock_update_flags bits
> + *
> + * %RQCF_REQ_SKIP - will request skipping of clock update on the next
> + *  call to __schedule(). This is an optimisation to avoid
> + *  neighbouring rq clock updates.
> + *
> + * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is
> + *  in effect and calls to update_rq_clock() are being ignored.
> + *
> + * %RQCF_UPDATED - is a debug flag that indicates whether a call has been
> + *  made to update_rq_clock() since the last time rq::lock was pinned.
> + *
> + * If inside of __schedule(), clock_update_flags will have been
> + * shifted left (a left shift is a cheap operation for the fast path
> + * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use,
> + *
> + *   if (rq-clock_update_flags >= RQCF_UPDATED)
> + *
> + * to check if %RQCF_UPADTED is set. It'll never be shifted more than
> + * one position though, because the next rq_unpin_lock() will shift it
> + * back.
> + */
> +#define RQCF_REQ_SKIP        0x01
> +#define RQCF_ACT_SKIP        0x02
> +#define RQCF_UPDATED 0x04
> +
> +static inline void assert_clock_updated(struct rq *rq)
> +{
> +#ifdef CONFIG_SCHED_DEBUG
> +     /*
> +      * The only reason for not seeing a clock update since the
> +      * last rq_pin_lock() is if we're currently skipping updates.
> +      */
> +     WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
> +#endif
> +}


I am afraid that it might eventually create a deadlock.
For example, there is the following call chain:

+ printk()
  + vprintk_func -> vprintk_default()
    + vprinkt_emit()
      + console_unlock()
        + up_console_sem()
          + up()                # takes &sem->lock
            + __up()
              + wake_up_process()
                + try_to_wake_up()
                  + ttwu_queue()
                    + ttwu_do_activate()
                      + ttwu_do_wakeup()
                        + rq_clock()
                          + lockdep_assert_held()
                            + WARN_ON_ONCE()
                              + printk()
                                + vprintk_func -> vprintk_default()
                                  + vprintk_emit()
                                    + console_try_lock()
                                      + down_trylock_console_sem()
                                        + __down_trylock_console_sem()
                                          + down_trylock()

   DEADLOCK: Unable to take &sem->lock


We have recently discussed similar deadlock, see the thread
around https://lkml.kernel.org/r/20160714221251.GE3057@ubuntu

A temporary solution would be to replace the WARN_ON_ONCE()
by printk_deferred(). Of course, this is far from ideal because
you do not get the stack, ...

Sergey is working on WARN_ON_ONCE_DEFERRED() but it is not
an easy task.


>  static inline u64 rq_clock(struct rq *rq)
>  {
>       lockdep_assert_held(&rq->lock);
> +     assert_clock_updated(rq);
> +
>       return rq->clock;
>  }
>  

I am not sure how the above call chain is realistic. But adding
WARN_ON() into the scheduler paths is risky in general.

Best Regards,
Petr

Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()

Reply via email to