On Mon, 2019-02-25 at 10:57 -0800, Dave Hansen wrote: > From: Dave Hansen <[email protected]> > > This is intended for use with NVDIMMs that are physically persistent > (physically like flash) so that they can be used as a cost-effective > RAM replacement. Intel Optane DC persistent memory is one > implementation of this kind of NVDIMM. > > Currently, a persistent memory region is "owned" by a device driver, > either the "Direct DAX" or "Filesystem DAX" drivers. These drivers > allow applications to explicitly use persistent memory, generally > by being modified to use special, new libraries. (DIMM-based > persistent memory hardware/software is described in great detail > here: Documentation/nvdimm/nvdimm.txt). > > However, this limits persistent memory use to applications which > *have* been modified. To make it more broadly usable, this driver > "hotplugs" memory into the kernel, to be managed and used just like > normal RAM would be. > > To make this work, management software must remove the device from > being controlled by the "Device DAX" infrastructure: > > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind > > and then tell the new driver that it can bind to the device: > > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id > > After this, there will be a number of new memory sections visible > in sysfs that can be onlined, or that may get onlined by existing > udev-initiated memory hotplug rules. > > This rebinding procedure is currently a one-way trip. Once memory > is bound to "kmem", it's there permanently and can not be > unbound and assigned back to device_dax. > > The kmem driver will never bind to a dax device unless the device > is *explicitly* bound to the driver. There are two reasons for > this: One, since it is a one-way trip, it can not be undone if > bound incorrectly. Two, the kmem driver destroys data on the > device. Think of if you had good data on a pmem device. It > would be catastrophic if you compile-in "kmem", but leave out > the "device_dax" driver. kmem would take over the device and > write volatile data all over your good data. > > This inherits any existing NUMA information for the newly-added > memory from the persistent memory device that came from the > firmware. On Intel platforms, the firmware has guarantees that > require each socket's persistent memory to be in a separate > memory-only NUMA node. That means that this patch is not expected > to create NUMA nodes, but will simply hotplug memory into existing > nodes. > > Because NUMA nodes are created, the existing NUMA APIs and tools > are sufficient to create policies for applications or memory areas > to have affinity for or an aversion to using this memory. > > There is currently some metadata at the beginning of pmem regions. > The section-size memory hotplug restrictions, plus this small > reserved area can cause the "loss" of a section or two of capacity. > This should be fixable in follow-on patches. But, as a first step, > losing 256MB of memory (worst case) out of hundreds of gigabytes > is a good tradeoff vs. the required code to fix this up precisely. > This calculation is also the reason we export > memory_block_size_bytes(). > > Signed-off-by: Dave Hansen <[email protected]> > Reviewed-by: Dan Williams <[email protected]> > Reviewed-by: Keith Busch <[email protected]> > Cc: Dave Jiang <[email protected]> > Cc: Ross Zwisler <[email protected]> > Cc: Vishal Verma <[email protected]> > Cc: Tom Lendacky <[email protected]> > Cc: Andrew Morton <[email protected]> > Cc: Michal Hocko <[email protected]> > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: Huang Ying <[email protected]> > Cc: Fengguang Wu <[email protected]> > Cc: Borislav Petkov <[email protected]> > Cc: Bjorn Helgaas <[email protected]> > Cc: Yaowei Bai <[email protected]> > Cc: Takashi Iwai <[email protected]> > Cc: Jerome Glisse <[email protected]> > --- > > b/drivers/base/memory.c | 1 > b/drivers/dax/Kconfig | 16 +++++++ > b/drivers/dax/Makefile | 1 > b/drivers/dax/kmem.c | 108 > ++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 126 insertions(+)
Looks good, Reviewed-by: Vishal Verma <[email protected]> _______________________________________________ Linux-nvdimm mailing list [email protected] https://lists.01.org/mailman/listinfo/linux-nvdimm
