On 07/06/20 16:59, Peter Zijlstra wrote:

[...]

> @@ -4104,12 +4108,19 @@ static void __sched notrace __schedule(bool preempt)
>       local_irq_disable();
>       rcu_note_context_switch(preempt);
>  
> +     prev_state = prev->state;
> +
>       /*
> -      * Make sure that signal_pending_state()->signal_pending() below
> -      * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> -      * done by the caller to avoid the race with signal_wake_up().
> +      * __set_current_state(@state)
> +      * schedule()                           signal_wake_up()
> +      *   prev_state = p->state                set_tsk_thread_flag(p, 
> TIF_SIGPENDING)
> +      *                                        wake_up_state()
> +      *   LOCK rq->lock                          LOCK p->pi_state
> +      *   smp_mb__after_spinlock()               smp_mb__after_spinlock()
> +      *     if (signal_pending_state()           if (p->state & @state)
> +      *
>        *
> -      * The membarrier system call requires a full memory barrier
> +      * Also, the membarrier system call requires a full memory barrier
>        * after coming from user-space, before storing to rq->curr.
>        */
>       rq_lock(rq, &rf);
> @@ -4120,10 +4131,30 @@ static void __sched notrace __schedule(bool preempt)
>       update_rq_clock(rq);
>  
>       switch_count = &prev->nivcsw;
> -     if (!preempt && prev->state) {
> -             if (signal_pending_state(prev->state, prev)) {
> +     /*
> +      * We must re-load p->state in case ttwu_runnable() changed it
> +      * before we acquired rq->lock.
> +      */
> +     if (!preempt && prev_state && prev_state == prev->state) {

I think the compiler won't optimize `prev_state == prev->state` out because of
the smp_mb__after_spinlock() which implies a compiler barrier. Still not sure
if it's worth making prev->state accesses a READ_ONCE()?

Thanks

--
Qais Yousef

Reply via email to