On Mon 04-06-18 16:12:37, Dan Williams wrote:
>     mce: Uncorrected hardware memory error in user-access at af34214200
>     {1}[Hardware Error]: It has been corrected by h/w and requires no further 
> action
>     mce: [Hardware Error]: Machine check events logged
>     {1}[Hardware Error]: event severity: corrected
>     Memory failure: 0xaf34214: reserved kernel page still referenced by 1 
> users
>     [..]
>     Memory failure: 0xaf34214: recovery action for reserved kernel page: 
> Failed
>     mce: Memory error not recovered
> 
> In contrast to typical memory, dev_pagemap pages may be dax mapped. With
> dax there is no possibility to map in another page dynamically since dax
> establishes 1:1 physical address to file offset associations. Also
> dev_pagemap pages associated with NVDIMM / persistent memory devices can
> internal remap/repair addresses with poison. While memory_failure()
> assumes that it can discard typical poisoned pages and keep them
> unmapped indefinitely, dev_pagemap pages may be returned to service
> after the error is cleared.
> 
> Teach memory_failure() to detect and handle MEMORY_DEVICE_HOST
> dev_pagemap pages that have poison consumed by userspace. Mark the
> memory as UC instead of unmapping it completely to allow ongoing access
> via the device driver (nd_pmem). Later, nd_pmem will grow support for
> marking the page back to WB when the error is cleared.

...

> +static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> +             struct dev_pagemap *pgmap)
> +{
> +     struct page *page = pfn_to_page(pfn);
> +     const bool unmap_success = true;
> +     struct address_space *mapping;
> +     unsigned long size;
> +     LIST_HEAD(tokill);
> +     int rc = -EBUSY;
> +     loff_t start;
> +
> +     /*
> +      * Prevent the inode from being freed while we are interrogating
> +      * the address_space, typically this would be handled by
> +      * lock_page(), but dax pages do not use the page lock.
> +      */
> +     rcu_read_lock();
> +     mapping = page->mapping;
> +     if (!mapping) {
> +             rcu_read_unlock();
> +             goto out;
> +     }
> +     if (!igrab(mapping->host)) {
> +             mapping = NULL;
> +             rcu_read_unlock();
> +             goto out;
> +     }
> +     rcu_read_unlock();

Why don't you use radix tree entry lock here instead? That is a direct
replacement of the page lock and you don't have to play games with pinning
the inode and verifying the mapping afterwards.

> +out:
> +     if (mapping)
> +             iput(mapping->host);

BTW, this would have to be prepared to do full inode deletion which I'm not
quite sure is safe from this context.

                                                                Honza
-- 
Jan Kara <[email protected]>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to