Add a new dmadev poll-mode driver for the AMD AE4DMA hardware DMA
engine.  An AE4DMA engine exposes 16 hardware command queues, each
with a 32-entry descriptor ring; the PMD maps each hardware channel
to its own dmadev with a single virtual channel, so a PCI function
appears as 16 dmadevs named "<pci-bdf>-ch0" .. "<pci-bdf>-ch15".

Driver characteristics:

 - Memory-to-memory copy operations only (RTE_DMA_CAPA_MEM_TO_MEM).
 - Completion is detected via the hardware's per-queue read_idx
   register, which the engine advances as it processes descriptors.
   The descriptor status / err_code bytes are read only to classify
   each drained slot as success or failure.
 - vchan_status reports IDLE/ACTIVE based on HW read_idx vs write_idx
   and HALTED_ERROR when the queue is not enabled.
 - depends on bus_pci and dmadev.

Signed-off-by: Raghavendra Ningoji <[email protected]>
---
 MAINTAINERS                            |   5 +
 doc/guides/dmadevs/ae4dma.rst          |  75 +++
 doc/guides/dmadevs/index.rst           |   1 +
 doc/guides/rel_notes/release_26_07.rst |   7 +
 drivers/dma/ae4dma/ae4dma_dmadev.c     | 742 +++++++++++++++++++++++++
 drivers/dma/ae4dma/ae4dma_hw_defs.h    | 164 ++++++
 drivers/dma/ae4dma/ae4dma_internal.h   | 117 ++++
 drivers/dma/ae4dma/meson.build         |   7 +
 drivers/dma/meson.build                |   1 +
 usertools/dpdk-devbind.py              |   5 +-
 10 files changed, 1123 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/dmadevs/ae4dma.rst
 create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
 create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
 create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
 create mode 100644 drivers/dma/ae4dma/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 9143d028bc..0b5a6e08d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1361,6 +1361,11 @@ F: doc/guides/compressdevs/features/zsda.ini
 DMAdev Drivers
 --------------
 
+AMD AE4DMA
+M: Bhagyada Modali <[email protected]>
+F: drivers/dma/ae4dma/
+F: doc/guides/dmadevs/ae4dma.rst
+
 Intel IDXD - EXPERIMENTAL
 M: Bruce Richardson <[email protected]>
 M: Kevin Laatz <[email protected]>
diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
new file mode 100644
index 0000000000..37a2096ccf
--- /dev/null
+++ b/doc/guides/dmadevs/ae4dma.rst
@@ -0,0 +1,75 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2025 Advanced Micro Devices, Inc.
+
+.. include:: <isonum.txt>
+
+AMD AE4DMA DMA Device Driver
+============================
+
+The ``ae4dma`` dmadev driver is a poll-mode driver (PMD) for the
+AMD AE4DMA hardware DMA engine. The engine exposes 16 independent
+hardware command queues, each with a ring of 32 descriptors. The PMD
+maps each hardware command queue to a separate DPDK dmadev with a
+single virtual channel, so a single PCI function appears as 16 dmadevs
+named ``<pci-bdf>-ch0`` through ``<pci-bdf>-ch15``.
+
+The driver supports memory-to-memory copy operations only.
+
+Hardware Requirements
+---------------------
+
+The ``dpdk-devbind.py`` script can be used to list AE4DMA devices on
+the system::
+
+   dpdk-devbind.py --status-dev dma
+
+AE4DMA devices appear with vendor ID ``0x1022`` and device ID
+``0x149b``.
+
+Compilation
+-----------
+
+The driver is built as part of the standard DPDK build on x86 platforms
+using ``meson`` and ``ninja``; no extra configuration is required.
+
+Device Setup
+------------
+
+The AE4DMA device must be bound to a DPDK-compatible kernel module such
+as ``vfio-pci`` before it can be used::
+
+   dpdk-devbind.py -b vfio-pci <pci-bdf>
+
+Initialization
+~~~~~~~~~~~~~~
+
+On probe the PMD performs the following steps for each PCI function:
+
+* Reads BAR0 and programs the common configuration register with the
+  number of hardware queues to enable (16).
+* For each hardware queue it allocates a 32-entry descriptor ring in
+  IOVA-contiguous memory, programs the queue base address and ring
+  depth into the per-queue registers, and enables the queue.
+* Interrupts are masked; completion is polled by the application.
+
+Usage
+-----
+
+Once a dmadev has been started, copies are submitted with
+``rte_dma_copy()`` and completions are reaped with ``rte_dma_completed()``
+or ``rte_dma_completed_status()``. See the
+:ref:`Enqueue / Dequeue API <dmadev_enqueue_dequeue>` section of the
+dmadev library documentation for details.
+
+Limitations
+-----------
+
+* Only memory-to-memory copies are supported. Fill, scatter-gather and
+  any other operation types are not advertised in
+  ``rte_dma_info::dev_capa``.
+* The maximum number of descriptors per virtual channel is fixed by
+  hardware at 32. The PMD rounds the requested ring size up to a
+  power of two and clamps it to 32.
+* Only a single virtual channel per dmadev is supported; use the 16
+  per-PCI-function dmadevs to obtain channel-level parallelism.
+* Interrupt-driven completion is not supported.
diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
index 56beb1733f..97399590f6 100644
--- a/doc/guides/dmadevs/index.rst
+++ b/doc/guides/dmadevs/index.rst
@@ -11,6 +11,7 @@ an application through DMA API.
    :maxdepth: 1
    :numbered:
 
+   ae4dma
    cnxk
    dpaa
    dpaa2
diff --git a/doc/guides/rel_notes/release_26_07.rst 
b/doc/guides/rel_notes/release_26_07.rst
index f012d47a4b..9a78a7ef62 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -63,6 +63,13 @@ New Features
     ``rte_eal_init`` and the application is responsible for probing each 
device,
   * ``--auto-probing`` enables the initial bus probing, which is the current 
default behavior.
 
+* **Added AMD AE4DMA DMA PMD.**
+
+  Added a new ``dma/ae4dma`` driver for the AMD AE4DMA hardware DMA engine.
+  Each PCI function exposes 16 hardware command queues; the PMD registers one
+  dmadev per channel with a single virtual channel and supports
+  memory-to-memory copy operations.
+
 
 Removed Items
 -------------
diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c 
b/drivers/dma/ae4dma/ae4dma_dmadev.c
new file mode 100644
index 0000000000..eb6ea88f55
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -0,0 +1,742 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021-2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <rte_bus_pci.h>
+#include <bus_pci_driver.h>
+#include <rte_dmadev_pmd.h>
+#include <rte_malloc.h>
+
+#include "ae4dma_internal.h"
+
+/*
+ * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
+ * virtual channel. The HW's per-queue register block must be densely
+ * packed right after the engine-common config register at BAR0+0; the
+ * build-time check below catches an accidental layout change.
+ */
+static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
+               "ae4dma_hwq_regs stride changed; per-queue offset math will 
break");
+
+RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
+
+#define AE4DMA_PMD_NAME dmadev_ae4dma
+#define AE4DMA_PMD_NAME_STR RTE_STR(AE4DMA_PMD_NAME)
+
+static const struct rte_memzone *
+ae4dma_queue_dma_zone_reserve(const char *queue_name,
+               uint32_t queue_size, int socket_id)
+{
+       const struct rte_memzone *mz;
+
+       mz = rte_memzone_lookup(queue_name);
+       if (mz != 0) {
+               if (((size_t)queue_size <= mz->len) &&
+                               ((socket_id == SOCKET_ID_ANY) ||
+                                (socket_id == mz->socket_id))) {
+                       AE4DMA_PMD_INFO("re-use memzone already "
+                                       "allocated for %s", queue_name);
+                       return mz;
+               }
+               AE4DMA_PMD_ERR("Incompatible memzone already "
+                               "allocated %s, size %u, socket %d. "
+                               "Requested size %u, socket %u",
+                               queue_name, (uint32_t)mz->len,
+                               mz->socket_id, queue_size, socket_id);
+               return NULL;
+       }
+       return rte_memzone_reserve_aligned(queue_name, queue_size,
+                       socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
+}
+
+/* Configure a device. */
+static int
+ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
+               const struct rte_dma_conf *dev_conf,
+               uint32_t conf_sz)
+{
+       if (sizeof(struct rte_dma_conf) != conf_sz)
+               return -EINVAL;
+
+       if (dev_conf->nb_vchans != 1)
+               return -EINVAL;
+
+       return 0;
+}
+
+/* Setup a virtual channel for AE4DMA, only 1 vchan is supported per dmadev. */
+static int
+ae4dma_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+               const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
+{
+       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       uint16_t max_desc = qconf->nb_desc;
+
+       if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
+               return -EINVAL;
+
+       if (max_desc < 2)
+               return -EINVAL;
+
+       if (!rte_is_power_of_2(max_desc))
+               max_desc = rte_align32pow2(max_desc);
+
+       if (max_desc > AE4DMA_DESCRIPTORS_PER_CMDQ) {
+               AE4DMA_PMD_DEBUG("DMA dev %u nb_desc clamped to %u",
+                               dev->data->dev_id, AE4DMA_DESCRIPTORS_PER_CMDQ);
+               max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+       }
+
+       cmd_q->qcfg = *qconf;
+       cmd_q->qcfg.nb_desc = max_desc;
+
+       /* Ensure all counters are reset, if reconfiguring/restarting device. */
+       memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+       return 0;
+}
+
+/* Start a configured device. */
+static int
+ae4dma_dev_start(struct rte_dma_dev *dev)
+{
+       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       uint16_t nb = cmd_q->qcfg.nb_desc;
+
+       if (nb == 0)
+               return -EBUSY;
+
+       /* Program ring depth expected by hardware. */
+       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, nb);
+       return 0;
+}
+
+/* Stop a configured device. */
+static int
+ae4dma_dev_stop(struct rte_dma_dev *dev)
+{
+       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+       if (cmd_q->hwq_regs != NULL)
+               AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+                               AE4DMA_CMD_QUEUE_DISABLE);
+       return 0;
+}
+
+/* Get device information of a device. */
+static int
+ae4dma_dev_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info,
+               uint32_t size)
+{
+       if (size < sizeof(*info))
+               return -EINVAL;
+       info->dev_name = dev->device->name;
+       info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM;
+       info->max_vchans = 1;
+       info->min_desc = 2;
+       info->max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+       info->nb_vchans = 1;
+       return 0;
+}
+
+/* Close a configured device. */
+static int
+ae4dma_dev_close(struct rte_dma_dev *dev)
+{
+       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+       if (cmd_q->hwq_regs != NULL)
+               AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+                               AE4DMA_CMD_QUEUE_DISABLE);
+
+       if (cmd_q->memz_name[0] != '\0') {
+               const struct rte_memzone *mz = 
rte_memzone_lookup(cmd_q->memz_name);
+
+               if (mz != NULL)
+                       rte_memzone_free(mz);
+       }
+       cmd_q->qbase_desc = NULL;
+       cmd_q->qbase_addr = NULL;
+       cmd_q->qbase_phys_addr = 0;
+       return 0;
+}
+
+/* trigger h/w to process enqued desc:doorbell - by next_write */
+static inline void
+__submit(struct ae4dma_dmadev *ae4dma)
+{
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       uint16_t write_idx = cmd_q->next_write;
+       uint16_t nb = cmd_q->qcfg.nb_desc;
+
+       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->write_idx, write_idx);
+       if (nb != 0)
+               cmd_q->stats.submitted += (uint16_t)((cmd_q->next_write - 
cmd_q->last_write +
+                               nb) % nb);
+       cmd_q->last_write = cmd_q->next_write;
+}
+
+static int
+ae4dma_submit(void *dev_private, uint16_t vchan __rte_unused)
+{
+       struct ae4dma_dmadev *ae4dma = dev_private;
+
+       __submit(ae4dma);
+       return 0;
+}
+
+/* Write descriptor for enqueue (copy only). */
+static inline int
+__write_desc_copy(void *dev_private, rte_iova_t src, phys_addr_t dst,
+               uint32_t len, uint64_t flags)
+{
+       struct ae4dma_dmadev *ae4dma = dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       struct ae4dma_desc *dma_desc;
+       uint16_t ret;
+       uint16_t nb = cmd_q->qcfg.nb_desc;
+       uint16_t write = cmd_q->next_write;
+
+       if (nb == 0)
+               return -EINVAL;
+
+       /* Reserve one slot to distinguish full from empty (power-of-two ring). 
*/
+       if ((uint32_t)cmd_q->ring_buff_count >= (uint32_t)(nb - 1))
+               return -ENOSPC;
+
+       dma_desc = &cmd_q->qbase_desc[write];
+       memset(dma_desc, 0, sizeof(*dma_desc));
+       dma_desc->length = len;
+       dma_desc->src_hi = upper_32_bits(src);
+       dma_desc->src_lo = lower_32_bits(src);
+       dma_desc->dst_hi = upper_32_bits(dst);
+       dma_desc->dst_lo = lower_32_bits(dst);
+       cmd_q->ring_buff_count++;
+       cmd_q->next_write = (uint16_t)((write + 1) % nb);
+       ret = write;
+       if (flags & RTE_DMA_OP_FLAG_SUBMIT)
+               __submit(ae4dma);
+       return ret;
+}
+
+/* Enqueue a copy operation onto the ae4dma device. */
+static int
+ae4dma_enqueue_copy(void *dev_private, uint16_t vchan __rte_unused,
+               rte_iova_t src, rte_iova_t dst, uint32_t length, uint64_t flags)
+{
+       return __write_desc_copy(dev_private, src, dst, length, flags);
+}
+
+/* Dump DMA device info. */
+static int
+ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
+{
+       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       struct ae4dma_cmd_queue *cmd_q;
+       void *ae4dma_mmio_base_addr = (uint8_t *)ae4dma->io_regs;
+
+       cmd_q = &ae4dma->cmd_q;
+       fprintf(f, "cmd_q->id              = %" PRIx64 "\n", cmd_q->id);
+       fprintf(f, "cmd_q->qidx            = %" PRIx64 "\n", cmd_q->qidx);
+       fprintf(f, "cmd_q->qsize           = %" PRIx64 "\n", cmd_q->qsize);
+       fprintf(f, "mmio_base_addr      = %p\n", ae4dma_mmio_base_addr);
+       fprintf(f, "queues per ae4dma engine     = %d\n", 
AE4DMA_READ_REG_OFFSET(
+                               ae4dma_mmio_base_addr, 
AE4DMA_COMMON_CONFIG_OFFSET));
+       fprintf(f, "== Private Data ==\n");
+       fprintf(f, "  Config: { ring_size: %u }\n", cmd_q->qcfg.nb_desc);
+       fprintf(f, "  Ring virt: %p\tphys: %#" PRIx64 "\n",
+                       (void *)cmd_q->qbase_desc,
+                       (uint64_t)cmd_q->qbase_phys_addr);
+       fprintf(f, "  Next write: %u\n", cmd_q->next_write);
+       fprintf(f, "  Next read: %u\n", cmd_q->next_read);
+       fprintf(f, "  current queue depth: %u\n", cmd_q->ring_buff_count);
+       fprintf(f, "  }\n");
+       fprintf(f, "  Key Stats { submitted: %" PRIu64 ", comp: %" PRIu64 ", 
failed: %" PRIu64 " }\n",
+               cmd_q->stats.submitted,
+               cmd_q->stats.completed,
+               cmd_q->stats.errors);
+       return 0;
+}
+
+/* Translates AE4DMA ChanERRs to DMA error codes. */
+static inline enum rte_dma_status_code
+__translate_status_ae4dma_to_dma(enum ae4dma_dma_err status)
+{
+       AE4DMA_PMD_DEBUG("ae4dma desc status = %d", status);
+
+       switch (status) {
+       case AE4DMA_DMA_ERR_NO_ERR:
+               return RTE_DMA_STATUS_SUCCESSFUL;
+       case AE4DMA_DMA_ERR_INV_LEN:
+               return RTE_DMA_STATUS_INVALID_LENGTH;
+       case AE4DMA_DMA_ERR_INV_SRC:
+               return RTE_DMA_STATUS_INVALID_SRC_ADDR;
+       case AE4DMA_DMA_ERR_INV_DST:
+               return RTE_DMA_STATUS_INVALID_DST_ADDR;
+       case AE4DMA_DMA_ERR_INV_ALIGN:
+               /* Name matches DPDK public enum spelling. */
+               return RTE_DMA_STATUS_DATA_POISION;
+       case AE4DMA_DMA_ERR_INV_HEADER:
+       case AE4DMA_DMA_ERR_INV_STATUS:
+               return RTE_DMA_STATUS_ERROR_UNKNOWN;
+       default:
+               return RTE_DMA_STATUS_ERROR_UNKNOWN;
+       }
+}
+
+/*
+ * Scan HW queue for completed descriptors (non-blocking).
+ *
+ * The AE4DMA engine signals completion by advancing the per-queue
+ * `read_idx` register; it does not (reliably) write a status value
+ * back into the descriptor. We therefore use the HW `read_idx`
+ * register as the source of truth and only inspect the descriptor's
+ * `dw1.err_code` byte to classify each completion as success or
+ * failure.
+ *
+ * @param cmd_q
+ *   The AE4DMA command queue.
+ * @param max_ops
+ *   Maximum descriptors to process this call.
+ * @param[out] failed_count
+ *   Number of completed descriptors that did not report success.
+ * @return
+ *   Number of descriptors completed (success + failure), <= max_ops.
+ */
+static inline uint16_t
+ae4dma_scan_hwq(struct ae4dma_cmd_queue *cmd_q, uint16_t max_ops,
+               uint16_t *failed_count)
+{
+       volatile struct ae4dma_desc *hw_desc;
+       uint16_t events_count = 0, fails = 0;
+       uint16_t tail;
+       uint16_t nb = cmd_q->qcfg.nb_desc;
+       uint16_t mask;
+       uint16_t hw_read_idx;
+       uint16_t in_flight;
+       uint16_t scan_cap;
+
+       if (nb == 0 || cmd_q->ring_buff_count == 0) {
+               *failed_count = 0;
+               return 0;
+       }
+       mask = nb - 1;
+
+       hw_read_idx = (uint16_t)(AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx) & 
mask);
+       tail = cmd_q->next_read;
+
+       /*
+        * Descriptors completed since our last visit live in the
+        * half-open ring range [tail, hw_read_idx). If HW hasn't
+        * moved we have nothing to do.
+        */
+       in_flight = (uint16_t)((hw_read_idx - tail) & mask);
+       if (in_flight == 0) {
+               *failed_count = 0;
+               return 0;
+       }
+
+       scan_cap = max_ops;
+       if (scan_cap > AE4DMA_DESCRIPTORS_PER_CMDQ)
+               scan_cap = AE4DMA_DESCRIPTORS_PER_CMDQ;
+       if (scan_cap > in_flight)
+               scan_cap = in_flight;
+       if (scan_cap > cmd_q->ring_buff_count)
+               scan_cap = (uint16_t)cmd_q->ring_buff_count;
+
+       while (events_count < scan_cap) {
+               uint8_t hw_status;
+               uint8_t hw_err;
+
+               hw_desc = &cmd_q->qbase_desc[tail];
+               hw_status = hw_desc->dw1.status;
+               hw_err = hw_desc->dw1.err_code;
+
+               /*
+                * read_idx advancing is the definitive completion
+                * signal. The per-descriptor status byte is informational
+                * and may not yet be written when we observe it:
+                *
+                *   AE4DMA_DMA_DESC_ERROR (4)
+                *     Hard failure - err_code names the precise cause.
+                *   AE4DMA_DMA_DESC_COMPLETED (3) or 0
+                *     Success.
+                *   AE4DMA_DMA_DESC_VALIDATED (1) / _PROCESSED (2)
+                *     Benign race: HW had not finished updating the
+                *     status byte at the instant we read it. Since
+                *     read_idx has moved past this slot, treat it as
+                *     success unless err_code says otherwise.
+                *
+                * A non-zero err_code is treated as a failure regardless
+                * of the observed status value.
+                */
+               if (hw_status == AE4DMA_DMA_DESC_ERROR ||
+                               hw_err != AE4DMA_DMA_ERR_NO_ERR) {
+                       fails++;
+                       AE4DMA_PMD_WARN("Desc failed: status=%u err=%u",
+                                       hw_status, hw_err);
+               }
+               cmd_q->status[events_count] = (enum ae4dma_dma_err)hw_err;
+               cmd_q->ring_buff_count--;
+               events_count++;
+               tail = (tail + 1) & mask;
+       }
+
+       cmd_q->stats.completed += events_count;
+       cmd_q->stats.errors += fails;
+       cmd_q->next_read = tail;
+       *failed_count = fails;
+       return events_count;
+}
+
+/* Returns successful operations count and sets error flag if any errors. */
+static uint16_t
+ae4dma_completed(void *dev_private, uint16_t vchan __rte_unused,
+               const uint16_t max_ops, uint16_t *last_idx, bool *has_error)
+{
+       struct ae4dma_dmadev *ae4dma = dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       uint16_t cpl_count, sl_count;
+       uint16_t err_count = 0;
+       uint16_t nb = cmd_q->qcfg.nb_desc;
+
+       *has_error = false;
+
+       cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+       if (cpl_count > max_ops)
+               cpl_count = max_ops;
+
+       if (cpl_count > 0 && last_idx != NULL)
+               *last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+       sl_count = cpl_count - err_count;
+       if (err_count)
+               *has_error = true;
+
+       return sl_count;
+}
+
+static uint16_t
+ae4dma_completed_status(void *dev_private, uint16_t vchan __rte_unused,
+               uint16_t max_ops, uint16_t *last_idx,
+               enum rte_dma_status_code *status)
+{
+       struct ae4dma_dmadev *ae4dma = dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       uint16_t cpl_count;
+       uint16_t i;
+       uint16_t err_count = 0;
+       uint16_t nb = cmd_q->qcfg.nb_desc;
+
+       cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+       if (cpl_count > max_ops)
+               cpl_count = max_ops;
+
+       if (cpl_count > 0 && last_idx != NULL)
+               *last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+       if (likely(err_count == 0)) {
+               for (i = 0; i < cpl_count; i++)
+                       status[i] = RTE_DMA_STATUS_SUCCESSFUL;
+       } else {
+               for (i = 0; i < cpl_count; i++)
+                       status[i] = 
__translate_status_ae4dma_to_dma(cmd_q->status[i]);
+       }
+
+       return cpl_count;
+}
+
+/* Get the remaining capacity of the ring. */
+static uint16_t
+ae4dma_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
+{
+       const struct ae4dma_dmadev *ae4dma = dev_private;
+       const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       uint16_t nb = cmd_q->qcfg.nb_desc;
+       uint16_t mask;
+       uint16_t read_idx = cmd_q->next_read;
+       uint16_t write_idx = cmd_q->next_write;
+       uint16_t used;
+
+       if (nb < 2 || !rte_is_power_of_2(nb))
+               return 0;
+
+       mask = nb - 1;
+       used = (uint16_t)((write_idx - read_idx) & mask);
+       /* One slot reserved (same rule as enqueue). */
+       if (used >= nb - 1)
+               return 0;
+       return (uint16_t)(nb - 1 - used);
+}
+
+/* Retrieve the generic stats of a DMA device. */
+static int
+ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+               struct rte_dma_stats *rte_stats, uint32_t size)
+{
+       const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       const struct rte_dma_stats *stats = &cmd_q->stats;
+
+       if (size < sizeof(*rte_stats))
+               return -EINVAL;
+       if (rte_stats == NULL)
+               return -EINVAL;
+
+       *rte_stats = *stats;
+       return 0;
+}
+
+/* Reset the generic stat counters for the DMA device. */
+static int
+ae4dma_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
+{
+       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+       memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+       return 0;
+}
+
+/*
+ * Report channel state to the dmadev framework.
+ *
+ *   RTE_DMA_VCHAN_HALTED_ERROR - HW queue is disabled (never started, or
+ *                                stopped via dev_stop()).
+ *   RTE_DMA_VCHAN_IDLE         - HW has caught up: read_idx == write_idx,
+ *                                no descriptors in flight.
+ *   RTE_DMA_VCHAN_ACTIVE       - HW still has descriptors to process.
+ */
+static int
+ae4dma_vchan_status(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+               enum rte_dma_vchan_status *status)
+{
+       const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+       const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+       uint32_t ctrl, hw_read, hw_write;
+
+       if (cmd_q->hwq_regs == NULL) {
+               *status = RTE_DMA_VCHAN_HALTED_ERROR;
+               return 0;
+       }
+
+       ctrl = AE4DMA_READ_REG(&cmd_q->hwq_regs->control_reg.control_raw);
+       if ((ctrl & AE4DMA_CMD_QUEUE_ENABLE) == 0) {
+               *status = RTE_DMA_VCHAN_HALTED_ERROR;
+               return 0;
+       }
+
+       hw_read  = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+       hw_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+
+       *status = (hw_read == hw_write) ? RTE_DMA_VCHAN_IDLE
+                                       : RTE_DMA_VCHAN_ACTIVE;
+       return 0;
+}
+
+static int
+ae4dma_add_queue(struct ae4dma_dmadev *dev, uint8_t qn, const char *pci_name)
+{
+       uint32_t dma_addr_lo, dma_addr_hi;
+       struct ae4dma_cmd_queue *cmd_q;
+       const struct rte_memzone *q_mz;
+
+       if (dev == NULL)
+               return -EINVAL;
+
+       dev->io_regs = dev->pci->mem_resource[AE4DMA_PCIE_BAR].addr;
+
+       cmd_q = &dev->cmd_q;
+       cmd_q->id = qn;
+       cmd_q->qidx = 0;
+       cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
+       cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn 
+ 1);
+
+       /*
+        * Memzone name must be globally unique. Embed PCI BDF so multiple
+        * PCI functions probed concurrently don't collide.
+        */
+       snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
+                       "ae4dma_%s_q%u", pci_name, (unsigned int)qn);
+
+       q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
+                       cmd_q->qsize, rte_socket_id());
+       if (q_mz == NULL) {
+               AE4DMA_PMD_ERR("memzone reserve failed for %s", 
cmd_q->memz_name);
+               return -ENOMEM;
+       }
+
+       cmd_q->qbase_addr = (void *)q_mz->addr;
+       cmd_q->qbase_desc = (struct ae4dma_desc *)q_mz->addr;
+       cmd_q->qbase_phys_addr = q_mz->iova;
+
+       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, 
AE4DMA_DESCRIPTORS_PER_CMDQ);
+       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+                       AE4DMA_CMD_QUEUE_ENABLE);
+       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
+                       AE4DMA_DISABLE_INTR);
+       cmd_q->next_write = 
(uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+       cmd_q->next_read = 
(uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+       cmd_q->ring_buff_count = 0;
+
+       dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
+       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
+       dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
+       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
+
+       return 0;
+}
+
+static void
+ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
+               unsigned int ch)
+{
+       snprintf(out, outlen, "%s-ch%u", pci_name, ch);
+}
+
+/* Create a dmadev(dpdk DMA device) */
+static int
+ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
+{
+       static const struct rte_dma_dev_ops ae4dma_dmadev_ops = {
+               .dev_close = ae4dma_dev_close,
+               .dev_configure = ae4dma_dev_configure,
+               .dev_dump = ae4dma_dev_dump,
+               .dev_info_get = ae4dma_dev_info_get,
+               .dev_start = ae4dma_dev_start,
+               .dev_stop = ae4dma_dev_stop,
+               .stats_get = ae4dma_stats_get,
+               .stats_reset = ae4dma_stats_reset,
+               .vchan_status = ae4dma_vchan_status,
+               .vchan_setup = ae4dma_vchan_setup,
+       };
+
+       struct rte_dma_dev *dmadev = NULL;
+       struct ae4dma_dmadev *ae4dma = NULL;
+       char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
+
+       if (!name) {
+               AE4DMA_PMD_ERR("Invalid name of the device!");
+               return -EINVAL;
+       }
+       memset(hwq_dev_name, 0, sizeof(hwq_dev_name));
+       ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
+
+       dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
+                       sizeof(struct ae4dma_dmadev));
+       if (dmadev == NULL) {
+               AE4DMA_PMD_ERR("Unable to allocate dma device");
+               return -ENOMEM;
+       }
+       dmadev->device = &dev->device;
+       dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+       dmadev->dev_ops = &ae4dma_dmadev_ops;
+
+       dmadev->fp_obj->burst_capacity = ae4dma_burst_capacity;
+       dmadev->fp_obj->completed = ae4dma_completed;
+       dmadev->fp_obj->completed_status = ae4dma_completed_status;
+       dmadev->fp_obj->copy = ae4dma_enqueue_copy;
+       dmadev->fp_obj->submit = ae4dma_submit;
+       /* fill capability not advertised: leave fp_obj->fill as 
zero-initialised. */
+
+       ae4dma = dmadev->data->dev_private;
+       ae4dma->dmadev = dmadev;
+       ae4dma->pci = dev;
+
+       if (ae4dma_add_queue(ae4dma, qn, name) != 0)
+               goto init_error;
+       return 0;
+
+init_error:
+       AE4DMA_PMD_ERR("driver %s(): failed", __func__);
+       rte_dma_pmd_release(hwq_dev_name);
+       return -EFAULT;
+}
+
+/* Probe DMA device. */
+static int
+ae4dma_dmadev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
+{
+       char name[32];
+       char chname[RTE_DEV_NAME_MAX_LEN];
+       void *mmio_base;
+       uint32_t q_per_eng;
+       int ret = 0;
+       uint8_t i;
+
+       rte_pci_device_name(&dev->addr, name, sizeof(name));
+       AE4DMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
+       dev->device.driver = &drv->driver;
+
+       mmio_base = dev->mem_resource[AE4DMA_PCIE_BAR].addr;
+       if (mmio_base == NULL) {
+               AE4DMA_PMD_ERR("%s: BAR%d not mapped", name, AE4DMA_PCIE_BAR);
+               return -ENODEV;
+       }
+
+       /* Program the per-engine HW queue count once. */
+       AE4DMA_WRITE_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET,
+                       AE4DMA_MAX_HW_QUEUES);
+       q_per_eng = AE4DMA_READ_REG_OFFSET(mmio_base, 
AE4DMA_COMMON_CONFIG_OFFSET);
+       AE4DMA_PMD_INFO("%s: AE4DMA queues per engine = %u", name, q_per_eng);
+
+       for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+               ret = ae4dma_dmadev_create(name, dev, i);
+               if (ret != 0) {
+                       AE4DMA_PMD_ERR("%s create dmadev %u failed!", name, i);
+                       while (i > 0) {
+                               i--;
+                               ae4dma_channel_dev_name(chname, sizeof(chname), 
name, i);
+                               rte_dma_pmd_release(chname);
+                       }
+                       break;
+               }
+       }
+       return ret;
+}
+
+/* Remove DMA device. */
+static int
+ae4dma_dmadev_remove(struct rte_pci_device *dev)
+{
+       char name[32];
+       char chname[RTE_DEV_NAME_MAX_LEN];
+       unsigned int i;
+
+       rte_pci_device_name(&dev->addr, name, sizeof(name));
+
+       AE4DMA_PMD_INFO("Closing %s on NUMA node %d",
+                       name, dev->device.numa_node);
+
+       for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+               ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
+               rte_dma_pmd_release(chname);
+       }
+       return 0;
+}
+
+static const struct rte_pci_id pci_id_ae4dma_map[] = {
+       { RTE_PCI_DEVICE(AMD_VENDOR_ID, AE4DMA_DEVICE_ID) },
+       { .vendor_id = 0, /* sentinel */ },
+};
+
+static struct rte_pci_driver ae4dma_pmd_drv = {
+       .id_table = pci_id_ae4dma_map,
+       .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+       .probe = ae4dma_dmadev_probe,
+       .remove = ae4dma_dmadev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(AE4DMA_PMD_NAME, ae4dma_pmd_drv);
+RTE_PMD_REGISTER_PCI_TABLE(AE4DMA_PMD_NAME, pci_id_ae4dma_map);
+RTE_PMD_REGISTER_KMOD_DEP(AE4DMA_PMD_NAME, "* igb_uio | uio_pci_generic | 
vfio-pci");
diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h 
b/drivers/dma/ae4dma/ae4dma_hw_defs.h
new file mode 100644
index 0000000000..235819778e
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef __AE4DMA_HW_DEFS_H__
+#define __AE4DMA_HW_DEFS_H__
+
+#include <rte_bus_pci.h>
+#include <rte_byteorder.h>
+#include <rte_io.h>
+#include <rte_pci.h>
+#include <rte_memzone.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define AE4DMA_BIT(nr)                 (1UL << (nr))
+
+#define AE4DMA_BITS_PER_LONG   (__SIZEOF_LONG__ * 8)
+#define AE4DMA_GENMASK(h, l) \
+       (((~0UL) << (l)) & (~0UL >> (AE4DMA_BITS_PER_LONG - 1 - (h))))
+
+/* ae4dma device details */
+#define AMD_VENDOR_ID  0x1022
+#define AE4DMA_DEVICE_ID       0x149b
+#define AE4DMA_PCIE_BAR 0
+
+/*
+ * An AE4DMA engine has 16 DMA queues. Each queue supports 32 descriptors.
+ */
+#define AE4DMA_MAX_HW_QUEUES        16
+#define AE4DMA_QUEUE_START_INDEX    0
+#define AE4DMA_CMD_QUEUE_ENABLE                0x1
+#define AE4DMA_CMD_QUEUE_DISABLE       0x0
+
+/* Common to all queues */
+#define AE4DMA_COMMON_CONFIG_OFFSET 0x00
+
+#define AE4DMA_DISABLE_INTR 0x01
+
+/* Descriptor status */
+enum ae4dma_dma_status {
+       AE4DMA_DMA_DESC_SUBMITTED = 0,
+       AE4DMA_DMA_DESC_VALIDATED = 1,
+       AE4DMA_DMA_DESC_PROCESSED = 2,
+       AE4DMA_DMA_DESC_COMPLETED = 3,
+       AE4DMA_DMA_DESC_ERROR = 4,
+};
+
+/* Descriptor error-code */
+enum ae4dma_dma_err {
+       AE4DMA_DMA_ERR_NO_ERR = 0,
+       AE4DMA_DMA_ERR_INV_HEADER = 1,
+       AE4DMA_DMA_ERR_INV_STATUS = 2,
+       AE4DMA_DMA_ERR_INV_LEN = 3,
+       AE4DMA_DMA_ERR_INV_SRC = 4,
+       AE4DMA_DMA_ERR_INV_DST = 5,
+       AE4DMA_DMA_ERR_INV_ALIGN = 6,
+       AE4DMA_DMA_ERR_UNKNOWN = 7,
+};
+
+/* HW Queue status */
+enum ae4dma_hwqueue_status {
+       AE4DMA_HWQUEUE_EMPTY = 0,
+       AE4DMA_HWQUEUE_FULL = 1,
+       AE4DMA_HWQUEUE_NOT_EMPTY = 4
+};
+/*
+ * descriptor for AE4DMA commands
+ * 8 32-bit words:
+ * word 0: source memory type; destination memory type ; control bits
+ * word 1: desc_id; error code; status
+ * word 2: length
+ * word 3: reserved
+ * word 4: upper 32 bits of source pointer
+ * word 5: low 32 bits of source pointer
+ * word 6: upper 32 bits of destination pointer
+ * word 7: low 32 bits of destination pointer
+ */
+
+/* AE4DMA Descriptor - DWORD0 - Controls bits: Reserved for future use */
+#define AE4DMA_DWORD0_STOP_ON_COMPLETION       AE4DMA_BIT(0)
+#define AE4DMA_DWORD0_INTERRUPT_ON_COMPLETION  AE4DMA_BIT(1)
+#define AE4DMA_DWORD0_START_OF_MESSAGE         AE4DMA_BIT(3)
+#define AE4DMA_DWORD0_END_OF_MESSAGE           AE4DMA_BIT(4)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE  AE4DMA_GENMASK(5, 4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE      AE4DMA_GENMASK(7, 6)
+
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_IOMEMORY  (1<<4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_IOMEMORY  (1<<6)
+
+struct ae4dma_desc_dword0 {
+       uint8_t byte0;
+       uint8_t byte1;
+       uint16_t timestamp;
+};
+
+struct ae4dma_desc_dword1 {
+       uint8_t status;
+       uint8_t err_code;
+       uint16_t desc_id;
+};
+
+struct ae4dma_desc {
+       struct ae4dma_desc_dword0 dw0;
+       struct ae4dma_desc_dword1 dw1;
+       uint32_t length;
+       uint32_t reserved;
+       uint32_t src_lo;
+       uint32_t src_hi;
+       uint32_t dst_lo;
+       uint32_t dst_hi;
+};
+
+/*
+ * Registers for each queue :4 bytes length
+ * Effective address : offset + reg
+ */
+struct ae4dma_hwq_regs {
+       union {
+               uint32_t control_raw;
+               struct {
+                       uint32_t queue_enable: 1;
+                       uint32_t reserved_internal: 31;
+               } control;
+       } control_reg;
+
+       union {
+               uint32_t status_raw;
+               struct {
+                       uint32_t reserved0: 1;
+                       /* 0–empty, 1–full, 2–stopped, 3–error , 4–Not Empty */
+                       uint32_t queue_status: 2;
+                       uint32_t reserved1: 21;
+                       uint32_t interrupt_type: 4;
+                       uint32_t reserved2: 4;
+               } status;
+       } status_reg;
+
+       uint32_t max_idx;
+       uint32_t read_idx;
+       uint32_t write_idx;
+
+       union {
+               uint32_t intr_status_raw;
+               struct {
+                       uint32_t intr_status: 1;
+                       uint32_t reserved: 31;
+               } intr_status;
+       } intr_status_reg;
+
+       uint32_t qbase_lo;
+       uint32_t qbase_hi;
+
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* AE4DMA_HW_DEFS_H */
diff --git a/drivers/dma/ae4dma/ae4dma_internal.h 
b/drivers/dma/ae4dma/ae4dma_internal.h
new file mode 100644
index 0000000000..d55cfbe3b8
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_internal.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _AE4DMA_INTERNAL_H_
+#define _AE4DMA_INTERNAL_H_
+
+#include <stdint.h>
+
+#include "ae4dma_hw_defs.h"
+
+/**
+ * upper_32_bits - return bits 32-63 of a number
+ * @n: the number we're accessing
+ */
+#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
+
+/**
+ * lower_32_bits - return bits 0-31 of a number
+ * @n: the number we're accessing
+ */
+#define lower_32_bits(n) ((uint32_t)((n) & 0xffffffff))
+
+/** Hardware ring depth (slots per queue); must be power of two. */
+#define AE4DMA_DESCRIPTORS_PER_CMDQ    32
+#define AE4DMA_QUEUE_DESC_SIZE         sizeof(struct ae4dma_desc)
+#define AE4DMA_QUEUE_SIZE(n)           (AE4DMA_DESCRIPTORS_PER_CMDQ * (n))
+
+
+/** AE4DMA registers Write/Read */
+static inline void ae4dma_pci_reg_write(void *base, int offset,
+               uint32_t value)
+{
+       volatile void *reg_addr = ((uint8_t *)base + offset);
+
+       rte_write32((rte_cpu_to_le_32(value)), reg_addr);
+}
+
+static inline uint32_t ae4dma_pci_reg_read(void *base, int offset)
+{
+       volatile void *reg_addr = ((uint8_t *)base + offset);
+
+       return rte_le_to_cpu_32(rte_read32(reg_addr));
+}
+
+#define AE4DMA_READ_REG_OFFSET(hw_addr, reg_offset) \
+       ae4dma_pci_reg_read(hw_addr, reg_offset)
+
+#define AE4DMA_WRITE_REG_OFFSET(hw_addr, reg_offset, value) \
+       ae4dma_pci_reg_write(hw_addr, reg_offset, value)
+
+
+#define AE4DMA_READ_REG(hw_addr) \
+       ae4dma_pci_reg_read((void *)(uintptr_t)(hw_addr), 0)
+
+#define AE4DMA_WRITE_REG(hw_addr, value) \
+       ae4dma_pci_reg_write((void *)(uintptr_t)(hw_addr), 0, value)
+
+static inline uint32_t
+low32_value(unsigned long addr)
+{
+       return ((uint64_t)addr) & 0xffffffffUL;
+}
+
+static inline uint32_t
+high32_value(unsigned long addr)
+{
+       return (uint32_t)(((uint64_t)addr) >> 32);
+}
+
+/**
+ * A structure describing a AE4DMA command queue.
+ */
+struct ae4dma_cmd_queue {
+       char memz_name[RTE_MEMZONE_NAMESIZE];
+       volatile struct ae4dma_hwq_regs *hwq_regs;
+
+       struct rte_dma_vchan_conf qcfg;
+       struct rte_dma_stats stats;
+       /* Queue address */
+       struct ae4dma_desc *qbase_desc;
+       void *qbase_addr;
+       phys_addr_t qbase_phys_addr;
+       enum ae4dma_dma_err status[AE4DMA_DESCRIPTORS_PER_CMDQ];
+       /* Queue identifier */
+       uint64_t id;    /**< queue id */
+       uint64_t qidx;  /**< queue index */
+       uint64_t qsize; /**< queue size */
+       uint32_t ring_buff_count;
+       unsigned short next_read;
+       unsigned short next_write;
+       unsigned short last_write; /* Used to compute submitted count. */
+} __rte_cache_aligned;
+
+/*
+ * One dmadev per AE4DMA hardware channel: probe creates AE4DMA_MAX_HW_QUEUES
+ * dmadevs per PCI function, each owning a single HW command queue.
+ */
+struct ae4dma_dmadev {
+       struct rte_dma_dev *dmadev;
+       void *io_regs;
+       struct ae4dma_cmd_queue cmd_q; /**< single HW queue owned by this 
dmadev */
+       struct rte_pci_device *pci;    /**< owning PCI device (not owned) */
+};
+
+
+extern int ae4dma_pmd_logtype;
+
+#define AE4DMA_PMD_LOG(level, fmt, args...) rte_log(RTE_LOG_ ## level, \
+               ae4dma_pmd_logtype, "AE4DMA: %s(): " fmt "\n", __func__, ##args)
+
+#define AE4DMA_PMD_DEBUG(fmt, args...)  AE4DMA_PMD_LOG(DEBUG, fmt, ## args)
+#define AE4DMA_PMD_INFO(fmt, args...)   AE4DMA_PMD_LOG(INFO, fmt, ## args)
+#define AE4DMA_PMD_ERR(fmt, args...)    AE4DMA_PMD_LOG(ERR, fmt, ## args)
+#define AE4DMA_PMD_WARN(fmt, args...)   AE4DMA_PMD_LOG(WARNING, fmt, ## args)
+
+#endif /* _AE4DMA_INTERNAL_H_ */
diff --git a/drivers/dma/ae4dma/meson.build b/drivers/dma/ae4dma/meson.build
new file mode 100644
index 0000000000..e48ab0d561
--- /dev/null
+++ b/drivers/dma/ae4dma/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Advanced Micro Devices, Inc. All rights reserved.
+
+build = dpdk_conf.has('RTE_ARCH_X86')
+reason = 'only supported on x86'
+sources = files('ae4dma_dmadev.c')
+deps += ['bus_pci', 'dmadev']
diff --git a/drivers/dma/meson.build b/drivers/dma/meson.build
index e0d94db967..c230ac5a06 100644
--- a/drivers/dma/meson.build
+++ b/drivers/dma/meson.build
@@ -2,6 +2,7 @@
 # Copyright 2021 HiSilicon Limited
 
 drivers = [
+        'ae4dma',
         'cnxk',
         'dpaa',
         'dpaa2',
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 93f2383dff..ec6d6713b4 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -86,6 +86,9 @@
 cn9k_ree = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f4',
             'SVendor': None, 'SDevice': None}
 
+amd_ae4dma = {'Class': '08', 'Vendor': '1022', 'Device': '149b',
+                 'SVendor': None, 'SDevice': None}
+
 virtio_blk = {'Class': '01', 'Vendor': "1af4", 'Device': '1001,1042',
               'SVendor': None, 'SDevice': None}
 
@@ -95,7 +98,7 @@
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
 crypto_devices = [encryption_class, intel_processor_class]
-dma_devices = [cnxk_dma, hisilicon_dma,
+dma_devices = [amd_ae4dma, cnxk_dma, hisilicon_dma,
                intel_idxd_gnrd, intel_idxd_dmr, intel_idxd_spr,
                intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx,
                odm_dma]
-- 
2.34.1

Reply via email to