17/04/2020 01:46, Dmitry Kozlyuk: > > * [AI Dmitry K, Harini] Dmitry K to send summary of conversation for > > feedback, Harini to follow-up for resolution. > > On Windows community calls we've been discussing memory management > implementation approaches and plans. This summary aims to bring everyone > interested to the same page and to record information in one public place. > > [Dmitry M] is Dmitry Malloy from Microsoft, [Dmitry K] is me. > Cc'ing Anatoly Burakov as DPDK memory subsystem maintainer. > > > Current State > ------------- > > Patches are sent for basic memory management that should be suitable for most > simple cases. Relevant implementation traits are as follows: > > * IOVA as PA only, PA is obtained via a kernel-mode driver. > * Hugepages are allocated dynamically in user-mode (2MB only), > IOVA-contiguity is provided by allocator to the extent possible. > * No multi-process support. > > > Background and Findings > ----------------------- > > Physical addresses are fundamentally limited and insecure because of the > following (this list is not specific to Windows, but provides context): > > 1. A user-mode application with access to DMA and PA can convince the > device to overwrite arbitrary RAM content, bypassing OS security. > > 2. IOMMU might be engaged rendering PA invalid for a particular device. > This mode is mandatory for PCI passthrough into VM. > > 3. IOMMU may be used even on a bare-metal system to protect against #1 by > limiting DMA for a device to IOMMU mappings. Zero-copy forwarding using > DMA from different RX and TX devices must take care of this. On Windows, > such mechanism is called Kernel DMA Protection [1]. > > 4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs).
Mellanox NICs work also with PA memory. > 5. In complex PCI topologies logical bus addresses may differ from PA, > although a concrete example is missing for modern systems (IoT SoC?). > > > Within Windows kernel there are two facilities to deal with the above: > > 1. DMA_ADAPTER interface and its AllocateDomainCommonBuffer() method [2]. > "DMA adapter" is an abstraction of bus-master mode or an allocated channel > of a DMA controller. Also, each device belongs to a DMA domain, initially > its so-called default domain. Only devices of the same domain can have a > buffer suitable for DMA by all devices. In that, DMA domains are similar > to IOMMU groups in Linux. > > Besides domain management, this interface allows allocation of such a > common buffer, that is, a contiguous range of IOVA (logical addresses) and > kernel VA (which can be mapped to user-space). Advantages of this > interface: 1) it is universal w.r.t. PCI topology, IOMMU, etc; 2) it > supports hugepages. One disadvantage is that kernel controls IOVA and VA. > > 2. DMA_IOMMU interface which is functionally similar to Linux VFIO driver, > that is, it allows management of IOMMU mappings within a domain [3]. > > [Dmitry M] Microsoft considers creating a generic memory-management driver > exposing (some of) these interfaces which will be shipped with Windows. This > is an idea on its early stage, not a commitment. DMA_ADAPTER and DMA_IOMMU are kernel interfaces, without any userspace API? > Notable DPDK memory management traits: > > 1. When memory is requested from EAL, it is unknown whether it will be used > for DMA and with which device. The hint is when rte_virt2iova() is called, > but this is not the case for VA-only devices. > > 2. Memory is reserved and then committed in segments (basically, hugepages). > > 3. There is a callback for segment list allocation and deallocation. For > example, Linux EAL uses it to create IOMMU mappings when VFIO is engaged. > > 4. There are drivers that explicitly request PA via rte_virt2phys(). > > > Last but not the least, user-mode memory management notes: > > 1. Windows doesn't report limits on the number of hugepages. > > 2. By official documentation, only 2MB hugepages are supported. > > [Dmitry M] There are new, still undocumented Win32 API flags for 1GB [5]. > [Dmitry K] Found a novel allocator library using these new features [6]. > Failed to make use of [5] with AWE, unclear how to integrate into MM. > > 3. Address Windowing Extensions [4] allow allocating physical page > frames (PFN) and then mapping them to VA, all in user-mode. > > [Dmitry K] Experiments show AWE cannot allocate hugepages (in a documented > way at least) and cannot reliably provide contiguous ranges (and does not > guarantee it). IMO, this interface is useless for common MM. Some drivers > that do not need hugepages but require PA may benefit from it. > > > Opens > ----- > > IMO, "Advanced memory management" milestone from roadmap should be split. Yes for splitting. Feel free to send a patch for the roadmap. And we should plan these tasks later in the year. Basic memory management should be enough for first steps with PMDs. > There are three major points of MM improvement, each requiring research and a > complex patch: > > 1. Proper DMA buffers via AllocateDomainCommonBuffer (DPDK part is unclear). > 2. VFIO-like code in Windows EAL using DMA_IOMMU. > 3. Support for 1GB hugepages and related changes. > > Windows kernel interfaces described above have poor documentation. On Windows > community call 2020-04-01 Dmitry Malloy agreed to help with this (concrete > questions were raised and noted). > > Hugepages of 1GB are desirable, but allocating them relies on undocumented > features. Also, because Windows does not provide hugepage limits, it may > require more work to manage multiple sizes in DPDK. > > > References > ---------- > > [1]: Kernel DMA Protection for Thunderboltâ„¢ 3 > <https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt> > [2]: DMA_IOMMU interface - > <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-iommu_map_identity_range> > [3]: DMA_ADAPTER.AllocateDomainCommonBuffer - > <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_domain_common_buffer> > [4]: Address Windowing Extensions (AWE) > <https://docs.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions> > [5]: GitHub issue <https://github.com/dotnet/runtime/issues/12779> > [6]: mimalloc <https://github.com/microsoft/mimalloc> Thanks for the great summary.