On 21-Feb-19 2:50 PM, Shahaf Shuler wrote:
The DPDK APIs expose 3 different modes to work with memory used for DMA:

1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.

2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.

3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.

The scope of the patch focus on #3 above.

Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).

The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.

For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.

Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.

Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.

Signed-off-by: Shahaf Shuler <shah...@mellanox.com>
---

<snip>

+
+       if (!pdev || !pdev->driver) {
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }

We could put a check in here to see if the memory has been registered with DPDK. Just call rte_mem_virt2memseg_list(addr) - if it returns NULL, the memory wasn't registered, so you can throw an error. Not sure of appropriate errno in that case - ENODEV? EINVAL?

+       if (pdev->driver->dma_map)
+               return pdev->driver->dma_map(pdev, addr, iova, len);
+       /**
+        *  In case driver don't provides any specific mapping
+        *  try fallback to VFIO.
+        */
+       if (pdev->kdrv == RTE_KDRV_VFIO)
+               return rte_vfio_container_dma_map
+                               (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+                                iova, len);

<snip>

+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+               size_t len)
+{
+       if (dev->bus->dma_map == NULL || len == 0) {
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }
+       /* Memory must be registered through rte_extmem_* APIs */
+       if (rte_mem_virt2memseg(addr, NULL) == NULL) {

No need to call rte_mem_virt2memseg - rte_mem_virt2memseg_list will do.

+               rte_errno = EINVAL;
+               return -rte_errno;
+       }
+
+       return dev->bus->dma_map(dev, addr, iova, len);
+}
+
+int
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+                 size_t len)
+{
+       if (dev->bus->dma_unmap == NULL || len == 0) {
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }

I think attempting to unmap a memory region that isn't registered should be an error, so rte_mem_virt2memseg_list call should be here too.

+
+       return dev->bus->dma_unmap(dev, addr, iova, len);
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index 6be4b5cabe..4faf2d20a0 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
  typedef int (*rte_bus_parse_t)(const char *name, void *addr);

<snip>

--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -515,4 +515,47 @@ rte_dev_hotplug_handle_enable(void);
  int __rte_experimental
  rte_dev_hotplug_handle_disable(void);
+/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.

here and in unmap:

@note please register memory first

?

+ *
+ * @param dev
+ *     Device pointer.
+ * @param addr
+ *     Virtual address to map.
+ * @param iova
+ *     IOVA address to map.
+ * @param len
+ *     Length of the memory segment being mapped.
+ *
+ * @return
+ *     0 if mapping was successful.
+ *     Negative value and rte_errno is set otherwise.

Here and in other similar places: why are we setting rte_errno *and* returning -rte_errno? Wouldn't returning -1 be enough?

--
Thanks,
Anatoly

Reply via email to