On Mon 04-06-18 16:12:37, Dan Williams wrote:
> mce: Uncorrected hardware memory error in user-access at af34214200
> {1}[Hardware Error]: It has been corrected by h/w and requires no further
> action
> mce: [Hardware Error]: Machine check events logged
> {1}[Hardware Error]: event severity: corrected
> Memory failure: 0xaf34214: reserved kernel page still referenced by 1
> users
> [..]
> Memory failure: 0xaf34214: recovery action for reserved kernel page:
> Failed
> mce: Memory error not recovered
>
> In contrast to typical memory, dev_pagemap pages may be dax mapped. With
> dax there is no possibility to map in another page dynamically since dax
> establishes 1:1 physical address to file offset associations. Also
> dev_pagemap pages associated with NVDIMM / persistent memory devices can
> internal remap/repair addresses with poison. While memory_failure()
> assumes that it can discard typical poisoned pages and keep them
> unmapped indefinitely, dev_pagemap pages may be returned to service
> after the error is cleared.
>
> Teach memory_failure() to detect and handle MEMORY_DEVICE_HOST
> dev_pagemap pages that have poison consumed by userspace. Mark the
> memory as UC instead of unmapping it completely to allow ongoing access
> via the device driver (nd_pmem). Later, nd_pmem will grow support for
> marking the page back to WB when the error is cleared.
...
> +static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> + struct dev_pagemap *pgmap)
> +{
> + struct page *page = pfn_to_page(pfn);
> + const bool unmap_success = true;
> + struct address_space *mapping;
> + unsigned long size;
> + LIST_HEAD(tokill);
> + int rc = -EBUSY;
> + loff_t start;
> +
> + /*
> + * Prevent the inode from being freed while we are interrogating
> + * the address_space, typically this would be handled by
> + * lock_page(), but dax pages do not use the page lock.
> + */
> + rcu_read_lock();
> + mapping = page->mapping;
> + if (!mapping) {
> + rcu_read_unlock();
> + goto out;
> + }
> + if (!igrab(mapping->host)) {
> + mapping = NULL;
> + rcu_read_unlock();
> + goto out;
> + }
> + rcu_read_unlock();
Why don't you use radix tree entry lock here instead? That is a direct
replacement of the page lock and you don't have to play games with pinning
the inode and verifying the mapping afterwards.
> +out:
> + if (mapping)
> + iput(mapping->host);
BTW, this would have to be prepared to do full inode deletion which I'm not
quite sure is safe from this context.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm