On Wed, May 20, 2026 at 4:12 PM Dmitry Baryshkov <[email protected]> wrote: > > On Tue, May 19, 2026 at 11:45:52AM +0530, Ekansh Gupta via B4 Relay wrote: > > From: Ekansh Gupta <[email protected]> > > > > Add documentation for the Qualcomm DSP Accelerator (QDA) driver under > > Documentation/accel/qda/. The documentation covers the driver > > architecture, GEM-based buffer management, IOMMU context bank > > isolation, and the RPMsg transport layer. > > > > The user-space API section describes the DRM IOCTLs for session > > management, GEM buffer allocation, and remote procedure invocation via > > the FastRPC protocol, along with a typical application lifecycle > > example. Sections for dynamic debug and basic testing are also > > included. > > > > Wire the new documentation into the Compute Accelerators index at > > Documentation/accel/index.rst. > > > > Assisted-by: Claude:claude-4-6-sonnet > > Signed-off-by: Ekansh Gupta <[email protected]> > > --- > > Documentation/accel/index.rst | 1 + > > Documentation/accel/qda/index.rst | 13 ++++ > > Documentation/accel/qda/qda.rst | 146 > > ++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 160 insertions(+) > > > > diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst > > index cbc7d4c3876a..5901ea7f784c 100644 > > --- a/Documentation/accel/index.rst > > +++ b/Documentation/accel/index.rst > > @@ -10,4 +10,5 @@ Compute Accelerators > > introduction > > amdxdna/index > > qaic/index > > + qda/index > > rocket/index > > diff --git a/Documentation/accel/qda/index.rst > > b/Documentation/accel/qda/index.rst > > new file mode 100644 > > index 000000000000..013400cf9c25 > > --- /dev/null > > +++ b/Documentation/accel/qda/index.rst > > @@ -0,0 +1,13 @@ > > +.. SPDX-License-Identifier: GPL-2.0-only > > + > > +================================== > > +accel/qda Qualcomm DSP Accelerator > > +================================== > > + > > +The QDA driver provides a DRM accel based interface for Qualcomm DSP > > offload. > > +It uses the FastRPC protocol and integrates with DRM and GEM infrastructure > > +for device and buffer management. > > + > > +.. toctree:: > > + > > + qda > > diff --git a/Documentation/accel/qda/qda.rst > > b/Documentation/accel/qda/qda.rst > > new file mode 100644 > > index 000000000000..9f49af6e6acc > > --- /dev/null > > +++ b/Documentation/accel/qda/qda.rst > > @@ -0,0 +1,146 @@ > > +.. SPDX-License-Identifier: GPL-2.0-only > > + > > +===================================== > > +Qualcomm DSP Accelerator (QDA) Driver > > +===================================== > > + > > +Introduction > > +============ > > + > > +The QDA driver is a DRM accel driver for Qualcomm's DSPs. It provides a > > +DRM accel based interface for Qualcomm DSP offload, supporting workloads > > +such as AI inference, computer vision, audio processing, and sensor offload > > +on Qualcomm SoCs. It uses the FastRPC protocol and integrates with DRM and > > +GEM infrastructure for device and buffer management. > > + > > +Key Features > > +============ > > + > > +* **DRM accel Interface**: Exposes a standard character device node > > + (e.g., ``/dev/accel/accel0``) via the DRM accel subsystem. > > +* **FastRPC Protocol**: Implements the FastRPC protocol for communication > > + between the application processor and the DSP. > > +* **GEM Buffer Management**: Uses the DRM GEM interface for buffer > > + allocation, lifecycle management, and DMA-BUF import/export. > > +* **IOMMU Isolation**: Uses IOMMU context banks to enforce memory > > isolation > > + between different DSP user sessions. > > +* **Modular Design**: Clean separation between the core DRM logic, the > > + memory manager, and the RPMsg-based transport layer. > > + > > +Architecture > > +============ > > + > > +The QDA driver consists of several functional blocks: > > + > > +1. **Core Driver (``qda_drv``)**: Manages device registration, file > > operations, > > + and DRM accel integration. > > +2. **Memory Manager (``qda_memory_manager``)**: A flexible memory > > management > > + layer that handles IOMMU context banks. It supports pluggable backends > > + (such as DMA-coherent) to adapt to different SoC memory architectures. > > +3. **GEM Subsystem**: Implements the DRM GEM interface for buffer > > management: > > + > > + * **``qda_gem``**: Core GEM object management, including allocation, > > mmap > > + operations, and buffer lifecycle management. > > + * **``qda_prime``**: PRIME import functionality for DMA-BUF > > interoperability > > + with other kernel subsystems. > > + > > +4. **Transport Layer (``qda_rpmsg``)**: Abstraction over the RPMsg > > framework > > + to handle low-level message passing with the DSP firmware. > > +5. **Compute Bus (``qda_compute_bus``)**: A custom virtual bus used to > > + enumerate and manage the specific compute context banks defined in the > > + device tree. The bus was introduced because IOMMU context banks (CBs) > > are > > + synthetic constructs — not real platform devices — making a platform > > driver > > + an incorrect abstraction for them. The earlier platform-driver > > approach also > > + had a race condition: device nodes were created before the RPMsg > > channel > > + resources were fully initialized, and because ``probe`` runs > > asynchronously, > > + applications could open a CB device and attempt to start a session > > before > > + the underlying transport was ready. The compute bus makes CB lifetime > > + explicitly subordinate to the parent QDA device, closing that window. > > +6. **FastRPC Core (``qda_fastrpc``)**: Implements the protocol logic for > > + marshalling arguments and handling remote invocations. > > + > > +User-Space API > > +============== > > + > > +The driver exposes a set of DRM-compliant IOCTLs: > > + > > +* ``DRM_IOCTL_QDA_QUERY``: Query DSP type (e.g., "cdsp", "adsp") > > + and capabilities. > > +* ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE``: Initialize a new process > > context > > + on the DSP. > > +* ``DRM_IOCTL_QDA_REMOTE_INVOKE``: Submit a remote method invocation (the > > + primary execution unit). > > +* ``DRM_IOCTL_QDA_GEM_CREATE``: Allocate a GEM buffer object for DSP > > usage. > > +* ``DRM_IOCTL_QDA_GEM_MMAP_OFFSET``: Retrieve mmap offsets for memory > > mapping. > > +* ``DRM_IOCTL_QDA_REMOTE_MAP`` / ``DRM_IOCTL_QDA_REMOTE_MUNMAP``: Map or > > unmap > > + buffers into the DSP's virtual address space. Each accepts a > > ``request`` > > + field selecting between a legacy operation (``QDA_MAP_REQUEST_LEGACY`` > > / > > + ``QDA_MUNMAP_REQUEST_LEGACY``) and an attribute-based operation > > + (``QDA_MAP_REQUEST_ATTR`` / ``QDA_MUNMAP_REQUEST_ATTR``). > > Explain, what happens in the users don't map the buffers into the DSP > space. Will DRM_IOCTL_QDA_REMOTE_INVOKE handle the mapping or not? What > is the difference between those two modes? > > Would the driver benefit from using GPUVM? > > > + > > +Usage Example > > +============= > > + > > +A typical lifecycle for a user-space application: > > + > > +1. **Discovery**: Open ``/dev/accel/accel*`` and use > > + ``DRM_IOCTL_QDA_QUERY`` to identify the DSP domain served by that > > + device node. > > +2. **Initialization**: Call ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE`` to > > + establish a session and create a process context on the DSP. > > +3. **Memory**: Allocate buffers via ``DRM_IOCTL_QDA_GEM_CREATE`` or import > > + DMA-BUFs (PRIME fd) from other drivers using > > ``DRM_IOCTL_PRIME_FD_TO_HANDLE``. > > +4. **Execution**: Use ``DRM_IOCTL_QDA_REMOTE_INVOKE`` to pass arguments > > and > > + execute functions on the DSP. > > +5. **Cleanup**: Close file descriptors to automatically release resources > > and > > + detach the session. > > I'd have expected the description of the actual example. I.e. clone the > app from https://the.addr, prepare clang >= NN.MM, QAIC (https://foo), > run make, run the app, check the results. I'd remind that DRM Accel has > a very specific requirement of having the working toolhain in the > open-source.
We have been getting submissions lately that don't fulfill that requirement so I will point to the precise part of the documentation that explains it: https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html#open-source-userspace-requirements For an example of a submissions that complies, see: https://lore.kernel.org/dri-devel/[email protected]/ Most importantly, notice how the proposed Thames Mesa driver generates machine code for all the hardware units, and doesn't use any blob for that. Regards, Tomeu > > + > > +Internal Implementation > > +======================= > > + > > +Memory Management > > +----------------- > > +The driver's memory manager creates virtual "IOMMU devices" that map to > > +hardware context banks. This allows the driver to manage multiple isolated > > +address spaces. The implementation uses a DMA-coherent backend to ensure > > data consistency > > +between the CPU and DSP without manual cache maintenance in most cases. > > GEM usage? > > > + > > +Debugging > > +========= > > +The driver includes extensive dynamic debug support. Enable it via the > > +kernel's dynamic debug control: > > + > > +.. code-block:: bash > > + > > + echo "file drivers/accel/qda/* +p" > > > /sys/kernel/debug/dynamic_debug/control > > + > > +Testing > > +======= > > +The QDA driver can be exercised using the ``fastrpc_test`` utility from the > > +FastRPC userspace library. Run the test application: > > pointer > > > + > > +.. code-block:: bash > > + > > + fastrpc_test -d 3 -U 1 -t linux -a v68 > > + > > +**Options** > > + > > +``-d domain`` > > + Select the DSP domain to run on: > > + > > + * ``0`` — ADSP > > + * ``1`` — MDSP > > + * ``2`` — SDSP > > + * ``3`` — CDSP *(default on targets with CDSP)* > > + > > +``-U unsigned_PD`` > > + Select signed or unsigned protection domain: > > + > > + * ``0`` — signed PD > > + * ``1`` — unsigned PD *(default)* > > + > > +``-t target`` > > + Target platform: ``android`` or ``linux`` *(default: linux)* > > + > > +``-a arch_version`` > > + DSP architecture version, e.g. ``v68``, ``v75`` *(default: v68)* > > > > -- > > 2.34.1 > > > > > > -- > With best wishes > Dmitry
