On Mon, Nov 26, 2018 at 9:11 AM Jan Kara <[email protected]> wrote:
>
> On Mon 26-11-18 08:12:40, Matthew Wilcox wrote:
> >
> > I noticed this path while I was doing the 4.19 backport of
> > dax: Avoid losing wakeup in dax_lock_mapping_entry
> >
> > xa_unlock_irq(&mapping->i_pages);
> > revalidate = wait_fn();
> > finish_wait(wq, &ewait.wait);
> > xa_lock_irq(&mapping->i_pages);
>
> I guess this is a snippet from get_unlocked_entry(), isn't it?
>
> > It's not safe to call xa_lock_irq() if mapping can have been freed while
> > we slept. We'll probably get away with it; most filesystems use a unique
> > slab for their inodes, so you'll likely get either a freed inode or an
> > inode which is now the wrong inode. But if that page has been freed back
> > to the page allocator, that pointer could now be pointing at anything.
>
> Correct. Thanks for catching this bug!
Yes, nice catch!
>
> > Fixing this in the current codebase is no easier than fixing it in the
> > 4.19 codebase. This is the best I've come up with. Could we do better
> > by not using the _exclusive form of prepare_to_wait()? I'm not familiar
> > with all the things that need to be considered when using this family
> > of interfaces.
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 9bcce89ea18e..154b592b18eb 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -232,6 +232,24 @@ static void *get_unlocked_entry(struct xa_state *xas)
> > }
> > }
> >
> > +static void wait_unlocked_entry(struct xa_state *xas, void *entry)
> > +{
> > + struct wait_exceptional_entry_queue ewait;
> > + wait_queue_head_t *wq;
> > +
> > + init_wait(&ewait.wait);
> > + ewait.wait.func = wake_exceptional_entry_func;
> > +
> > + wq = dax_entry_waitqueue(xas, entry, &ewait.key);
> > + prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE);
> > + xas_unlock_irq(xas);
> > + /* We can no longer look at xas */
> > + schedule();
> > + finish_wait(wq, &ewait.wait);
> > + if (waitqueue_active(wq))
> > + __wake_up(wq, TASK_NORMAL, 1, &ewait.key);
> > +}
> > +
>
> The code looks good. Maybe can we call this wait_entry_unlocked() to stress
> that entry is not really usable after this function returns? And comment
> before the function that this is safe to call even if we don't have a
> reference keeping mapping alive?
Yes, maybe even something more ambiguous like "wait_entry_event()",
because there's no guarantee the entry is unlocked just that now is a
good time to try to interrogate the entry again.
>
> > static void put_unlocked_entry(struct xa_state *xas, void *entry)
> > {
> > /* If we were the only waiter woken, wake the next one */
> > @@ -389,9 +407,7 @@ bool dax_lock_mapping_entry(struct page *page)
> > entry = xas_load(&xas);
> > if (dax_is_locked(entry)) {
> > rcu_read_unlock();
> > - entry = get_unlocked_entry(&xas);
> > - xas_unlock_irq(&xas);
> > - put_unlocked_entry(&xas, entry);
> > + wait_unlocked_entry(&xas, entry);
> > rcu_read_lock();
> > continue;
>
> The continue here actually is not safe either because if the mapping got
> freed, page->mapping will be NULL and we oops at the beginning of the loop.
> So that !dax_mapping() check should also check for mapping != NULL.
Yes.
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm