Dan Williams <dan.j.willi...@intel.com> writes: > On Thu, Feb 13, 2020 at 8:58 AM Jeff Moyer <jmo...@redhat.com> wrote: >> >> Dan Williams <dan.j.willi...@intel.com> writes: >> >> > The "sub-section memory hotplug" facility allows memremap_pages() users >> > like libnvdimm to compensate for hardware platforms like x86 that have a >> > section size larger than their hardware memory mapping granularity. The >> > compensation that sub-section support affords is being tolerant of >> > physical memory resources shifting by units smaller (64MiB on x86) than >> > the memory-hotplug section size (128 MiB). Where the platform >> > physical-memory mapping granularity is limited by the number and >> > capability of address-decode-registers in the memory controller. >> > >> > While the sub-section support allows memremap_pages() to operate on >> > sub-section (2MiB) granularity, the Power architecture may still >> > require 16MiB alignment on "!radix_enabled()" platforms. >> > >> > In order for libnvdimm to be able to detect and manage this per-arch >> > limitation, introduce memremap_compat_align() as a common minimum >> > alignment across all driver-facing memory-mapping interfaces, and let >> > Power override it to 16MiB in the "!radix_enabled()" case. >> > >> > The assumption / requirement for 16MiB to be a viable >> > memremap_compat_align() value is that Power does not have platforms >> > where its equivalent of address-decode-registers never hardware remaps a >> > persistent memory resource on smaller than 16MiB boundaries. Note that I >> > tried my best to not add a new Kconfig symbol, but header include >> > entanglements defeated the #ifndef memremap_compat_align design pattern >> > and the need to export it defeats the __weak design pattern for arch >> > overrides. >> > >> > Based on an initial patch by Aneesh. >> >> I have just a couple of questions. >> >> First, can you please add a comment above the generic implementation of >> memremap_compat_align describing its purpose, and why a platform might >> want to override it? > > Sure, how about: > > /* > * The memremap() and memremap_pages() interfaces are alternately used > * to map persistent memory namespaces. These interfaces place different > * constraints on the alignment and size of the mapping (namespace). > * memremap() can map individual PAGE_SIZE pages. memremap_pages() can > * only map subsections (2MB), and at least one architecture (PowerPC) > * the minimum mapping granularity of memremap_pages() is 16MB. > * > * The role of memremap_compat_align() is to communicate the minimum > * arch supported alignment of a namespace such that it can freely > * switch modes without violating the arch constraint. Namely, do not > * allow a namespace to be PAGE_SIZE aligned since that namespace may be > * reconfigured into a mode that requires SUBSECTION_SIZE alignment. > */ > >> Second, I will take it at face value that the power architecture >> requires a 16MB alignment, but it's not clear to me why mmu_linear_psize >> was chosen to represent that. What's the relationship, there, and can >> we please have a comment explaining it? > > Aneesh, can you help here?
With hash translation, we map the direct-map range with just one page size. Based on different restrictions as described in htab_init_page_sizes we can end up choosing 16M, 64K or even 4K. We use the variable mmu_linear_psize to indicate which page size we used for direct-map range. ie we should do. +unsigned long arch_namespace_align_size(void) +{ + unsigned long sub_section_size = (1UL << SUBSECTION_SHIFT); + + if (radix_enabled()) + return sub_section_size; + return max(sub_section_size, (1UL << mmu_psize_defs[mmu_linear_psize].shift)); + +} +EXPORT_SYMBOL_GPL(arch_namespace_align_size); as done here https://lore.kernel.org/linux-nvdimm/20200120140749.69549-4-aneesh.ku...@linux.ibm.com/ Dan can you update the powerpc definition? -aneesh