On Tue, 01 Sep 2020, Paul E. McKenney wrote:
And it appears that a default-niced CPU-bound SCHED_OTHER process is not preempted by a newly awakened MAX_NICE SCHED_OTHER process. OK, OK, I never waited for more than 10 minutes, but on my 2.2GHz that is close enough to a hang for most people.Which means that the patch below prevents the hangs. And maybe does other things as well, firing rcutorture up on it to check. But is this indefinite delay expected behavior? This reproduces for me on current mainline as follows: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05 This hangs within a minute of boot on my setup. Here "hangs" is defined as stopping the per-15-second console output of: Writes: Total: 569906696 Max/Min: 81495031/63736508 Fail: 0
Ok this doesn't seem to be related to lockless wake_qs then. fyi there have been missed wakeups in the past where wake_q_add() fails the cmpxchg because the task is already pending a wakeup leading to the actual wakeup ocurring before its corresponding wake_up_q(). This is why we have wake_q_add_safe(). But for rtmutexes, because there is no lock stealing only top-waiter is awoken as well as try_to_take_rt_mutex() is done under the lock->wait_lock I was not seeing an actual race here. Thanks, Davidlohr

