326e1b8f83a4 46d945aeab4d 49ba3c6b37b3 7cc7867fb061 7e3e888dfc13
7ea6216049ff 96da43500009 9a845030427c a0653406a3a6 a3619190d62e
ba72b4c8cf60 e9c0a3f05477 f1eca35a0dc7 f46edbd1b151

v5.3

** Changed in: intel
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829689

Title:
  [AEP]Sub-section memory hotplug support, fix namepsace padding

Status in intel:
  Fix Released
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  The memory hotplug section is an arbitrary / convenient unit for memory
  hotplug. 'Section-size' units have bled into the user interface
  ('memblock' sysfs) and can not be changed without breaking existing
  userspace. The section-size constraint, while mostly benign for typical
  memory hotplug, has and continues to wreak havoc with 'device-memory'
  use cases, persistent memory (pmem) in particular. Recall that pmem uses
  devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
  'struct page' memmap for pmem. However, it does not use the 'bottom
  half' of memory hotplug, i.e. never marks pmem pages online and never
  exposes the userspace memblock interface for pmem. This leaves an
  opening to redress the section-size constraint.

  To date, the libnvdimm subsystem has attempted to inject padding to
  satisfy the internal constraints of arch_add_memory(). Beyond
  complicating the code, leading to bugs [2], wasting memory, and limiting
  configuration flexibility, the padding hack is broken when the platform
  changes this physical memory alignment of pmem from one boot to the
  next. Device failure (intermittent or permanent) and physical
  reconfiguration are events that can cause the platform firmware to
  change the physical placement of pmem on a subsequent boot, and device
  failure is an everyday event in a data-center.

  It turns out that sections are only a hard requirement of the
  user-facing interface for memory hotplug and with a bit more
  infrastructure sub-section arch_add_memory() support can be added for
  kernel internal usages like devm_memremap_pages(). Here is an analysis
  of the current design assumptions in the current code and how they are
  addressed in the new implementation:

  Current design assumptions:

  Sections that describe boot memory (early sections) are never
  unplugged / removed.
  pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
  valid_section() check
  __add_pages() and helper routines assume all operations occur in
  PAGES_PER_SECTION units.
  The memblock sysfs interface only comprehends full sections
  New design assumptions:

  Sections are instrumented with a sub-section bitmask to track (on x86)
  individual 2MB sub-divisions of a 128MB section.
  Partially populated early sections can be extended with additional
  sub-sections, and those sub-sections can be removed with
  arch_remove_memory(). With this in place we no longer lose usable memory
  capacity to padding.
  pfn_valid() is updated to look deeper than valid_section() to also check the
  active-sub-section mask. This indication is in the same cacheline as
  the valid_section() so the performance impact is expected to be
  negligible. So far the lkp robot has not reported any regressions.
  Outside of the core vmemmap population routines which are replaced,
  other helper routines like shrink_{zone,pgdat}_span() are updated to
  handle the smaller granularity. Core memory hotplug routines that deal
  with online memory are not touched.

  The existing memblock sysfs user api guarantees / assumptions are
  not touched since this capability is limited to !online
  !memblock-sysfs-accessible sections.
  Meanwhile the issue reports continue to roll in from users that do not
  understand when and how the 128MB constraint will bite them. The current
  implementation relied on being able to support at least one misaligned
  namespace, but that immediately falls over on any moderately complex
  namespace creation attempt. Beyond the initial problem of 'System RAM'
  colliding with pmem, and the unsolvable problem of physical alignment
  changes, Linux is now being exposed to platforms that collide pmem
  ranges with other pmem ranges by default [3]. In short,
  devm_memremap_pages() has pushed the venerable section-size constraint
  past the breaking point, and the simplicity of section-aligned
  arch_add_memory() is no longer tenable.

  Target Kernel: 5.3
  Target Release: 19.10

To manage notifications about this bug go to:
https://bugs.launchpad.net/intel/+bug/1829689/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to