On Fri 08-06-18 16:51:14, Dan Williams wrote:
> In preparation for implementing support for memory poison (media error)
> handling via dax mappings, implement a lock_page() equivalent. Poison
> error handling requires rmap and needs guarantees that the page->mapping
> association is maintained / valid (inode not freed) for the duration of
> the lookup.
> 
> In the device-dax case it is sufficient to simply hold a dev_pagemap
> reference. In the filesystem-dax case we need to use the entry lock.
> 
> Export the entry lock via dax_lock_page() that uses rcu_read_lock() to
> protect against the inode being freed, and revalidates the page->mapping
> association under xa_lock().
> 
> Signed-off-by: Dan Williams <[email protected]>

Some comments below...

> diff --git a/fs/dax.c b/fs/dax.c
> index cccf6cad1a7a..b7e71b108fcf 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -361,6 +361,82 @@ static void dax_disassociate_entry(void *entry, struct 
> address_space *mapping,
>       }
>  }
>  
> +struct page *dax_lock_page(unsigned long pfn)
> +{

Why do you return struct page here? Any reason behind that? Because struct
page exists and can be accessed through pfn_to_page() regardless of result
of this function so it looks a bit confusing. Also dax_lock_page() name
seems a bit confusing. Maybe dax_lock_pfn_mapping_entry()?

> +     pgoff_t index;
> +     struct inode *inode;
> +     wait_queue_head_t *wq;
> +     void *entry = NULL, **slot;
> +     struct address_space *mapping;
> +     struct wait_exceptional_entry_queue ewait;
> +     struct page *ret = NULL, *page = pfn_to_page(pfn);
> +
> +     rcu_read_lock();
> +     for (;;) {
> +             mapping = READ_ONCE(page->mapping);
> +
> +             if (!mapping || !IS_DAX(mapping->host))
> +                     break;
> +
> +             /*
> +              * In the device-dax case there's no need to lock, a
> +              * struct dev_pagemap pin is sufficient to keep the
> +              * inode alive.
> +              */
> +             inode = mapping->host;
> +             if (S_ISCHR(inode->i_mode)) {
> +                     ret = page;
> +                     break;
> +             }
> +
> +             xa_lock_irq(&mapping->i_pages);
> +             if (mapping != page->mapping) {
> +                     xa_unlock_irq(&mapping->i_pages);
> +                     continue;
> +             }
> +             index = page->index;
> +
> +             init_wait(&ewait.wait);
> +             ewait.wait.func = wake_exceptional_entry_func;

This initialization could be before the loop.

> +
> +             entry = __radix_tree_lookup(&mapping->i_pages, index, NULL,
> +                             &slot);
> +             if (!entry ||
> +                 WARN_ON_ONCE(!radix_tree_exceptional_entry(entry))) {
> +                     xa_unlock_irq(&mapping->i_pages);
> +                     break;
> +             } else if (!slot_locked(mapping, slot)) {
> +                     lock_slot(mapping, slot);
> +                     ret = page;
> +                     xa_unlock_irq(&mapping->i_pages);
> +                     break;
> +             }
> +
> +             wq = dax_entry_waitqueue(mapping, index, entry, &ewait.key);
> +             prepare_to_wait_exclusive(wq, &ewait.wait,
> +                             TASK_UNINTERRUPTIBLE);
> +             xa_unlock_irq(&mapping->i_pages);
> +             rcu_read_unlock();
> +             schedule();
> +             finish_wait(wq, &ewait.wait);
> +             rcu_read_lock();
> +     }
> +     rcu_read_unlock();

I don't like how this duplicates a lot of get_unlocked_mapping_entry().
Can we possibly factor this out similary as done for wait_event()?

                                                                Honza
-- 
Jan Kara <[email protected]>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to