On Wed, Apr 13, 2016 at 5:03 PM, Thomas Monjalon <thomas.monjalon at 6wind.com> wrote:
> After looking at the patches for container support, it appears that > some changes are needed in the memory management: > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788 > > I think it is time to collect what are the needs and expectations of > the DPDK memory allocator. The goal is to satisfy every needs while > cleaning the API. > Here is a first try to start the discussion. > > The memory allocator has 2 classes of API in DPDK. > First the user/application allows or requires DPDK to take over some > memory resources of the system. The characteristics can be: > - numa node > - page size > - swappable or not > - contiguous (cannot be guaranteed) or not > - physical address (as root only) > Then the drivers or other libraries use the memory through > - rte_malloc > - rte_memzone > - rte_mempool > I think we can integrate the characteristics of the requested memory > in rte_malloc. Then rte_memzone would be only a named rte_malloc. > The rte_mempool still focus on collection of objects with cache. > > If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM > and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed. > The Xen support should also be better integrated. > > Currently, the first class of API is directly implemented as command line > parameters. Please let's think of C functions first. > The EAL parameters should simply wrap some API functions and let the > applications tune the memory initialization with a well documented API. > > Probably that I forget some needs, e.g. for the secondary processes. > Please comment. > Just to mention VFIO IOMMU mapping should be adjusted for just those memsegs physically contiguous which rte_pktmbuf_pool_create will allocate along with those hugepages backing driver/device descriptor rings. Mapping all the memsegs is not a performance issue but I think it is the right thing to do. Maybe some memseg flag like "DMA_CAPABLE" or similar should be used for IOMMU mapping. Other question is avoiding to use mbufs from no "DMA_CAPABLE" segments with a device. I'm thinking about an DPDK app using a virtio network driver and a device-backed PMD at the same time what could be a possibility for having best of both worlds (intra-host and inter-host VM communications).