Adding the scheduler people to the participants list, and re-attaching the patch, because while this patch is internal to the VM code, the issue itself is not.
There might well be other cases where somebody goes "wake_up_all()" will wake everybody up, so I can put the wait queue head on the stack, and then after I've woken people up I can return". Ingo/PeterZ: the reason that does *not* work is that "wake_up_all()" does make sure that everybody is woken up, but the usual autoremove wake function only removes the wakeup entry if the process was woken up by that particular wakeup. If something else had woken it up, the entry remains on the list, and the waker in this case returned with the wait head still containing entries. Which is deadly when the wait queue head is on the stack. So I'm wondering if we should make that "synchronous_wake_function()" available generally, and maybe introduce a DEFINE_WAIT_SYNC() helper that uses it. Of course, I'm really hoping that this shmem.c use is the _only_ such case. But I doubt it. Comments? Note for Ingo and Peter: this patch has not been tested at all. But Vegard did test an earlier patch of mine that just verified that yes, the issue really was that wait queue entries remained on the wait queue head just as we were about to return and free it. Linus On Mon, Dec 5, 2016 at 12:10 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > Anyway, can you try this patch instead? It should actually cause the > wake_up_all() to always remove all entries, and thus the WARN_ON() > should no longer happen (and I removed the "list_del()" hackery). > > Linus
diff --git a/mm/shmem.c b/mm/shmem.c index 166ebf5d2bce..17beb44e9f4f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1848,6 +1848,19 @@ alloc_nohuge: page = shmem_alloc_and_acct_page(gfp, info, sbinfo, return error; } +/* + * This is like autoremove_wake_function, but it removes the wait queue + * entry unconditionally - even if something else had already woken the + * target. + */ +static int synchronous_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key) +{ + int ret = default_wake_function(wait, mode, sync, key); + list_del_init(&wait->task_list); + return ret; +} + + static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) { struct inode *inode = file_inode(vma->vm_file); @@ -1883,7 +1896,7 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) vmf->pgoff >= shmem_falloc->start && vmf->pgoff < shmem_falloc->next) { wait_queue_head_t *shmem_falloc_waitq; - DEFINE_WAIT(shmem_fault_wait); + DEFINE_WAIT_FUNC(shmem_fault_wait, synchronous_wake_function); ret = VM_FAULT_NOPAGE; if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) && @@ -2665,6 +2678,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, spin_lock(&inode->i_lock); inode->i_private = NULL; wake_up_all(&shmem_falloc_waitq); + WARN_ON_ONCE(!list_empty(&shmem_falloc_waitq.task_list)); spin_unlock(&inode->i_lock); error = 0; goto out;