On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote: > On Wed 11-05-16 15:59:38, Michal Hocko wrote: > > On Wed 11-05-16 11:41:28, Peter Zijlstra wrote: > > > On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote: > > > > > > > Care to cook up a full patch? > > > > > > compile tested only, if someone could please test it? > > > > I have tried to run the test case from Tetsuo[1] with a small printk to > > show the interrupted writer case: > > [ 2753.596678] XXX: Writer interrupted. Woken waiters:0 > > [ 2998.266978] XXX: Writer interrupted. Woken waiters:0 > > > > which means rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem) path which is > > the problematic case. oom_reaper was always able to succeed so I guess > > the patch works as expected. I will leave the test run for longer to be > > sure. > > And just for the reference I am able to reproduce the lockup without the > patch applied and the same test case and a debugging patch > > [ 1522.036379] XXX interrupted. list_is_singular:1 > [ 1523.040462] oom_reaper: unable to reap pid:3736 (tgid=3736)
Here the complete patch. Ingo could you make it appear in the right tip branch? --- Subject: locking, rwsem: Fix down_write_killable() From: Peter Zijlstra <[email protected]> Date: Wed, 11 May 2016 11:41:28 +0200 The new signal_pending exit path in __rwsem_down_write_failed_common() was fingered as breaking his kernel by Tetsuo Handa. Upon inspection it was found that there are two things wrong with it; - it forgets to remove WAITING_BIAS if it leaves the list empty, or - it forgets to wake further waiters that were blocked on the now removed waiter. Especially the first issue causes new lock attempts to block and stall indefinitely, as the code assumes that pending waiters mean there is an owner that will wake when it releases the lock. Cc: "David S. Miller" <[email protected]> Cc: Waiman Long <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Tony Luck <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Max Filippov <[email protected]> Reported-by: Tetsuo Handa <[email protected]> Tested-by: Tetsuo Handa <[email protected]> Tested-by: Michal Hocko <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> --- kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct /* Block until there are no active lockers. */ do { - if (signal_pending_state(state, current)) { - raw_spin_lock_irq(&sem->wait_lock); - ret = ERR_PTR(-EINTR); - goto out; - } + if (signal_pending_state(state, current)) + goto out_nolock; + schedule(); set_current_state(state); } while ((count = sem->count) & RWSEM_ACTIVE_MASK); raw_spin_lock_irq(&sem->wait_lock); } -out: __set_current_state(TASK_RUNNING); list_del(&waiter.list); raw_spin_unlock_irq(&sem->wait_lock); return ret; + +out_nolock: + __set_current_state(TASK_RUNNING); + raw_spin_lock_irq(&sem->wait_lock); + list_del(&waiter.list); + if (list_empty(&sem->wait_list)) + rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem); + else + __rwsem_do_wake(sem, RWSEM_WAKE_ANY); + raw_spin_unlock_irq(&sem->wait_lock); + + return ERR_PTR(-EINTR); } __visible struct rw_semaphore * __sched

