Dan Williams <dan.j.willi...@intel.com> writes: > On Tue, May 19, 2020 at 6:53 AM Aneesh Kumar K.V > <aneesh.ku...@linux.ibm.com> wrote: >> >> Dan Williams <dan.j.willi...@intel.com> writes: >> >> > On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V >> > <aneesh.ku...@linux.ibm.com> wrote: >> >> ... >> >> >> Applications using new instructions will behave as expected when running >> >> on P8 and P9. Only future hardware will differentiate between 'dcbf' and >> >> 'dcbfps' >> > >> > Right, this is the problem. Applications using new instructions behave >> > as expected, the kernel has been shipping of_pmem and papr_scm for >> > several cycles now, you're saying that the DAX applications written >> > against those platforms are going to be broken on P8 and P9? >> >> The expecation is that both kernel and userspace would get upgraded to >> use the new instruction before actual persistent memory devices are >> made available. >> >> > >> >> > I'm thinking the kernel >> >> > should go as far as to disable DAX operation by default on new >> >> > hardware until userspace asserts that it is prepared to switch to the >> >> > new implementation. Is there any other way to ensure the forward >> >> > compatibility of deployed ppc64 DAX applications? >> >> >> >> AFAIU there is no released persistent memory hardware on ppc64 platform >> >> and we need to make sure before applications get enabled to use these >> >> persistent memory devices, they should switch to use the new >> >> instruction? >> > >> > Right, I want the kernel to offer some level of safety here because >> > everything you are describing sounds like a flag day conversion. Am I >> > misreading? Is there some other gate that prevents existing users of >> > of_pmem and papr_scm from having their expectations violated when >> > running on P8 / P9 hardware? Maybe there's tighter ecosystem control >> > that I'm just not familiar with, I'm only going off the fact that the >> > kernel has shipped a non-zero number of NVDIMM drivers that build with >> > ARCH=ppc64 for several cycles. >> >> If we are looking at adding changes to kernel that will prevent a kernel >> from running on newer hardware in a specific case, we could as well take >> the changes to get the kernel use the newer instructions right? > > Oh, no, I'm not talking about stopping the kernel from running. I'm > simply recommending that support for MAP_SYNC mappings (userspace > managed flushing) be disabled by default on PPC with either a > compile-time or run-time default to assert that userspace has been > audited for legacy applications or that the platform owner is > otherwise willing to take the risk. > >> But I agree with your concern that if we have older kernel/applications >> that continue to use `dcbf` on future hardware we will end up >> having issues w.r.t powerfail consistency. The plan is what you outlined >> above as tighter ecosystem control. Considering we don't have a pmem >> device generally available, we get both kernel and userspace upgraded >> to use these new instructions before such a device is made available. > > Ok, I think a compile time kernel option with a runtime override > satisfies my concern. Does that work for you?
something like below? But this still won't handle devdax mmap right? diff --git a/arch/arm64/include/asm/libnvdimm.h b/arch/arm64/include/asm/libnvdimm.h new file mode 100644 index 000000000000..aee697a72537 --- /dev/null +++ b/arch/arm64/include/asm/libnvdimm.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ARCH_LIBNVDIMM_H__ +#define __ARCH_LIBNVDIMM_H__ + +#endif /* __ARCH_LIBNVDIMM_H__ */ diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h new file mode 100644 index 000000000000..da479200bfb8 --- /dev/null +++ b/arch/powerpc/include/asm/libnvdimm.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ARCH_LIBNVDIMM_H__ +#define __ARCH_LIBNVDIMM_H__ + +#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm +extern bool arch_disable_sync_nvdimm(void); + +#endif /* __ARCH_LIBNVDIMM_H__ */ diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c index 0666a8d29596..3ce4fb4f167b 100644 --- a/arch/powerpc/lib/pmem.c +++ b/arch/powerpc/lib/pmem.c @@ -9,6 +9,8 @@ #include <asm/cacheflush.h> + +static bool sync_fault = IS_ENABLED(CONFIG_PPC_NVDIMM_SYNC_FAULT); /* * CONFIG_ARCH_HAS_PMEM_API symbols */ @@ -57,3 +59,16 @@ void memcpy_page_flushcache(char *to, struct page *page, size_t offset, memcpy_flushcache(to, page_to_virt(page) + offset, len); } EXPORT_SYMBOL(memcpy_page_flushcache); + +bool arch_disable_sync_nvdimm(void) +{ + return !sync_fault; +} + +static int __init parse_sync_fault(char *p) +{ + sync_fault = true; + return 0; +} +early_param("enable_sync_fault", parse_sync_fault); + diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 27a81c291be8..dde11d75a746 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -383,6 +383,15 @@ config PPC_KUEP If you're unsure, say Y. +config PPC_NVDIMM_SYNC_FAULT + bool "Synchronous fault support (MAP_SYNC)" + default n + help + Enable support for synchronous fault with nvdimm namespaces. + + If you're unsure, say N. + + config PPC_HAVE_KUAP bool diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h new file mode 100644 index 000000000000..aee697a72537 --- /dev/null +++ b/arch/x86/include/asm/libnvdimm.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ARCH_LIBNVDIMM_H__ +#define __ARCH_LIBNVDIMM_H__ + +#endif /* __ARCH_LIBNVDIMM_H__ */ diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index ccbb5b43b8b2..74a0809491af 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -1278,6 +1278,13 @@ bool is_nvdimm_sync(struct nd_region *nd_region) if (is_nd_volatile(&nd_region->dev)) return true; + /* + * If arch is forcing a synchronous fault + * disable. + */ + if (arch_disable_sync_nvdimm()) + return false; + return is_nd_pmem(&nd_region->dev) && !test_bit(ND_REGION_ASYNC, &nd_region->flags); } diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index 18da4059be09..891449aebe91 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -13,6 +13,8 @@ #include <linux/spinlock.h> #include <linux/bio.h> +#include <asm/libnvdimm.h> + struct badrange_entry { u64 start; u64 length; @@ -286,4 +288,12 @@ static inline void arch_invalidate_pmem(void *addr, size_t size) } #endif +#ifndef arch_disable_sync_nvdimm +#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm +static inline bool arch_disable_sync_nvdimm() +{ + return false; +} +#endif + #endif /* __LIBNVDIMM_H__ */