On Wed, May 20, 2026 at 4:12 PM Dmitry Baryshkov
<[email protected]> wrote:
>
> On Tue, May 19, 2026 at 11:45:52AM +0530, Ekansh Gupta via B4 Relay wrote:
> > From: Ekansh Gupta <[email protected]>
> >
> > Add documentation for the Qualcomm DSP Accelerator (QDA) driver under
> > Documentation/accel/qda/. The documentation covers the driver
> > architecture, GEM-based buffer management, IOMMU context bank
> > isolation, and the RPMsg transport layer.
> >
> > The user-space API section describes the DRM IOCTLs for session
> > management, GEM buffer allocation, and remote procedure invocation via
> > the FastRPC protocol, along with a typical application lifecycle
> > example. Sections for dynamic debug and basic testing are also
> > included.
> >
> > Wire the new documentation into the Compute Accelerators index at
> > Documentation/accel/index.rst.
> >
> > Assisted-by: Claude:claude-4-6-sonnet
> > Signed-off-by: Ekansh Gupta <[email protected]>
> > ---
> >  Documentation/accel/index.rst     |   1 +
> >  Documentation/accel/qda/index.rst |  13 ++++
> >  Documentation/accel/qda/qda.rst   | 146 
> > ++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 160 insertions(+)
> >
> > diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
> > index cbc7d4c3876a..5901ea7f784c 100644
> > --- a/Documentation/accel/index.rst
> > +++ b/Documentation/accel/index.rst
> > @@ -10,4 +10,5 @@ Compute Accelerators
> >     introduction
> >     amdxdna/index
> >     qaic/index
> > +   qda/index
> >     rocket/index
> > diff --git a/Documentation/accel/qda/index.rst 
> > b/Documentation/accel/qda/index.rst
> > new file mode 100644
> > index 000000000000..013400cf9c25
> > --- /dev/null
> > +++ b/Documentation/accel/qda/index.rst
> > @@ -0,0 +1,13 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> > +
> > +==================================
> > +accel/qda Qualcomm DSP Accelerator
> > +==================================
> > +
> > +The QDA driver provides a DRM accel based interface for Qualcomm DSP 
> > offload.
> > +It uses the FastRPC protocol and integrates with DRM and GEM infrastructure
> > +for device and buffer management.
> > +
> > +.. toctree::
> > +
> > +   qda
> > diff --git a/Documentation/accel/qda/qda.rst 
> > b/Documentation/accel/qda/qda.rst
> > new file mode 100644
> > index 000000000000..9f49af6e6acc
> > --- /dev/null
> > +++ b/Documentation/accel/qda/qda.rst
> > @@ -0,0 +1,146 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> > +
> > +=====================================
> > +Qualcomm DSP Accelerator (QDA) Driver
> > +=====================================
> > +
> > +Introduction
> > +============
> > +
> > +The QDA driver is a DRM accel driver for Qualcomm's DSPs. It provides a
> > +DRM accel based interface for Qualcomm DSP offload, supporting workloads
> > +such as AI inference, computer vision, audio processing, and sensor offload
> > +on Qualcomm SoCs. It uses the FastRPC protocol and integrates with DRM and
> > +GEM infrastructure for device and buffer management.
> > +
> > +Key Features
> > +============
> > +
> > +*   **DRM accel Interface**: Exposes a standard character device node
> > +    (e.g., ``/dev/accel/accel0``) via the DRM accel subsystem.
> > +*   **FastRPC Protocol**: Implements the FastRPC protocol for communication
> > +    between the application processor and the DSP.
> > +*   **GEM Buffer Management**: Uses the DRM GEM interface for buffer
> > +    allocation, lifecycle management, and DMA-BUF import/export.
> > +*   **IOMMU Isolation**: Uses IOMMU context banks to enforce memory 
> > isolation
> > +    between different DSP user sessions.
> > +*   **Modular Design**: Clean separation between the core DRM logic, the
> > +    memory manager, and the RPMsg-based transport layer.
> > +
> > +Architecture
> > +============
> > +
> > +The QDA driver consists of several functional blocks:
> > +
> > +1.  **Core Driver (``qda_drv``)**: Manages device registration, file 
> > operations,
> > +    and DRM accel integration.
> > +2.  **Memory Manager (``qda_memory_manager``)**: A flexible memory 
> > management
> > +    layer that handles IOMMU context banks. It supports pluggable backends
> > +    (such as DMA-coherent) to adapt to different SoC memory architectures.
> > +3.  **GEM Subsystem**: Implements the DRM GEM interface for buffer 
> > management:
> > +
> > +    * **``qda_gem``**: Core GEM object management, including allocation, 
> > mmap
> > +      operations, and buffer lifecycle management.
> > +    * **``qda_prime``**: PRIME import functionality for DMA-BUF 
> > interoperability
> > +      with other kernel subsystems.
> > +
> > +4.  **Transport Layer (``qda_rpmsg``)**: Abstraction over the RPMsg 
> > framework
> > +    to handle low-level message passing with the DSP firmware.
> > +5.  **Compute Bus (``qda_compute_bus``)**: A custom virtual bus used to
> > +    enumerate and manage the specific compute context banks defined in the
> > +    device tree. The bus was introduced because IOMMU context banks (CBs) 
> > are
> > +    synthetic constructs — not real platform devices — making a platform 
> > driver
> > +    an incorrect abstraction for them. The earlier platform-driver 
> > approach also
> > +    had a race condition: device nodes were created before the RPMsg 
> > channel
> > +    resources were fully initialized, and because ``probe`` runs 
> > asynchronously,
> > +    applications could open a CB device and attempt to start a session 
> > before
> > +    the underlying transport was ready. The compute bus makes CB lifetime
> > +    explicitly subordinate to the parent QDA device, closing that window.
> > +6.  **FastRPC Core (``qda_fastrpc``)**: Implements the protocol logic for
> > +    marshalling arguments and handling remote invocations.
> > +
> > +User-Space API
> > +==============
> > +
> > +The driver exposes a set of DRM-compliant IOCTLs:
> > +
> > +*   ``DRM_IOCTL_QDA_QUERY``: Query DSP type (e.g., "cdsp", "adsp")
> > +    and capabilities.
> > +*   ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE``: Initialize a new process 
> > context
> > +    on the DSP.
> > +*   ``DRM_IOCTL_QDA_REMOTE_INVOKE``: Submit a remote method invocation (the
> > +    primary execution unit).
> > +*   ``DRM_IOCTL_QDA_GEM_CREATE``: Allocate a GEM buffer object for DSP 
> > usage.
> > +*   ``DRM_IOCTL_QDA_GEM_MMAP_OFFSET``: Retrieve mmap offsets for memory 
> > mapping.
> > +*   ``DRM_IOCTL_QDA_REMOTE_MAP`` / ``DRM_IOCTL_QDA_REMOTE_MUNMAP``: Map or 
> > unmap
> > +    buffers into the DSP's virtual address space. Each accepts a 
> > ``request``
> > +    field selecting between a legacy operation (``QDA_MAP_REQUEST_LEGACY`` 
> > /
> > +    ``QDA_MUNMAP_REQUEST_LEGACY``) and an attribute-based operation
> > +    (``QDA_MAP_REQUEST_ATTR`` / ``QDA_MUNMAP_REQUEST_ATTR``).
>
> Explain, what happens in the users don't map the buffers into the DSP
> space. Will DRM_IOCTL_QDA_REMOTE_INVOKE handle the mapping or not? What
> is the difference between those two modes?
>
> Would the driver benefit from using GPUVM?
>
> > +
> > +Usage Example
> > +=============
> > +
> > +A typical lifecycle for a user-space application:
> > +
> > +1.  **Discovery**: Open ``/dev/accel/accel*`` and use
> > +    ``DRM_IOCTL_QDA_QUERY`` to identify the DSP domain served by that
> > +    device node.
> > +2.  **Initialization**: Call ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE`` to
> > +    establish a session and create a process context on the DSP.
> > +3.  **Memory**: Allocate buffers via ``DRM_IOCTL_QDA_GEM_CREATE`` or import
> > +    DMA-BUFs (PRIME fd) from other drivers using 
> > ``DRM_IOCTL_PRIME_FD_TO_HANDLE``.
> > +4.  **Execution**: Use ``DRM_IOCTL_QDA_REMOTE_INVOKE`` to pass arguments 
> > and
> > +    execute functions on the DSP.
> > +5.  **Cleanup**: Close file descriptors to automatically release resources 
> > and
> > +    detach the session.
>
> I'd have expected the description of the actual example. I.e. clone the
> app from https://the.addr, prepare clang >= NN.MM, QAIC (https://foo),
> run make, run the app, check the results. I'd remind that DRM Accel has
> a very specific requirement of having the working toolhain in the
> open-source.

We have been getting submissions lately that don't fulfill that
requirement so I will point to the precise part of the documentation
that explains it:

https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html#open-source-userspace-requirements

For an example of a submissions that complies, see:

https://lore.kernel.org/dri-devel/[email protected]/

Most importantly, notice how the proposed Thames Mesa driver generates
machine code for all the hardware units, and doesn't use any blob for
that.

Regards,

Tomeu

> > +
> > +Internal Implementation
> > +=======================
> > +
> > +Memory Management
> > +-----------------
> > +The driver's memory manager creates virtual "IOMMU devices" that map to
> > +hardware context banks. This allows the driver to manage multiple isolated
> > +address spaces. The implementation uses a DMA-coherent backend to ensure 
> > data consistency
> > +between the CPU and DSP without manual cache maintenance in most cases.
>
> GEM usage?
>
> > +
> > +Debugging
> > +=========
> > +The driver includes extensive dynamic debug support. Enable it via the
> > +kernel's dynamic debug control:
> > +
> > +.. code-block:: bash
> > +
> > +    echo "file drivers/accel/qda/* +p" > 
> > /sys/kernel/debug/dynamic_debug/control
> > +
> > +Testing
> > +=======
> > +The QDA driver can be exercised using the ``fastrpc_test`` utility from the
> > +FastRPC userspace library. Run the test application:
>
> pointer
>
> > +
> > +.. code-block:: bash
> > +
> > +    fastrpc_test -d 3 -U 1 -t linux -a v68
> > +
> > +**Options**
> > +
> > +``-d domain``
> > +    Select the DSP domain to run on:
> > +
> > +    * ``0`` — ADSP
> > +    * ``1`` — MDSP
> > +    * ``2`` — SDSP
> > +    * ``3`` — CDSP *(default on targets with CDSP)*
> > +
> > +``-U unsigned_PD``
> > +    Select signed or unsigned protection domain:
> > +
> > +    * ``0`` — signed PD
> > +    * ``1`` — unsigned PD *(default)*
> > +
> > +``-t target``
> > +    Target platform: ``android`` or ``linux`` *(default: linux)*
> > +
> > +``-a arch_version``
> > +    DSP architecture version, e.g. ``v68``, ``v75`` *(default: v68)*
> >
> > --
> > 2.34.1
> >
> >
>
> --
> With best wishes
> Dmitry

Reply via email to