On Wed, Jun 3, 2026 at 7:22 AM Ekansh Gupta <[email protected]> wrote: > > On 20-05-2026 21:17, Tomeu Vizoso wrote: > > On Wed, May 20, 2026 at 4:12 PM Dmitry Baryshkov > > <[email protected]> wrote: > >> > >> On Tue, May 19, 2026 at 11:45:52AM +0530, Ekansh Gupta via B4 Relay wrote: > >>> From: Ekansh Gupta <[email protected]> > >>> > >>> Add documentation for the Qualcomm DSP Accelerator (QDA) driver under > >>> Documentation/accel/qda/. The documentation covers the driver > >>> architecture, GEM-based buffer management, IOMMU context bank > >>> isolation, and the RPMsg transport layer. > >>> > >>> The user-space API section describes the DRM IOCTLs for session > >>> management, GEM buffer allocation, and remote procedure invocation via > >>> the FastRPC protocol, along with a typical application lifecycle > >>> example. Sections for dynamic debug and basic testing are also > >>> included. > >>> > >>> Wire the new documentation into the Compute Accelerators index at > >>> Documentation/accel/index.rst. > >>> > >>> Assisted-by: Claude:claude-4-6-sonnet > >>> Signed-off-by: Ekansh Gupta <[email protected]> > >>> --- > >>> Documentation/accel/index.rst | 1 + > >>> Documentation/accel/qda/index.rst | 13 ++++ > >>> Documentation/accel/qda/qda.rst | 146 > >>> ++++++++++++++++++++++++++++++++++++++ > >>> 3 files changed, 160 insertions(+)
<snip> > >>> +Usage Example > >>> +============= > >>> + > >>> +A typical lifecycle for a user-space application: > >>> + > >>> +1. **Discovery**: Open ``/dev/accel/accel*`` and use > >>> + ``DRM_IOCTL_QDA_QUERY`` to identify the DSP domain served by that > >>> + device node. > >>> +2. **Initialization**: Call ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE`` to > >>> + establish a session and create a process context on the DSP. > >>> +3. **Memory**: Allocate buffers via ``DRM_IOCTL_QDA_GEM_CREATE`` or > >>> import > >>> + DMA-BUFs (PRIME fd) from other drivers using > >>> ``DRM_IOCTL_PRIME_FD_TO_HANDLE``. > >>> +4. **Execution**: Use ``DRM_IOCTL_QDA_REMOTE_INVOKE`` to pass arguments > >>> and > >>> + execute functions on the DSP. > >>> +5. **Cleanup**: Close file descriptors to automatically release > >>> resources and > >>> + detach the session. > >> > >> I'd have expected the description of the actual example. I.e. clone the > >> app from https://the.addr, prepare clang >= NN.MM, QAIC (https://foo), > >> run make, run the app, check the results. I'd remind that DRM Accel has > >> a very specific requirement of having the working toolhain in the > >> open-source. > > > > We have been getting submissions lately that don't fulfill that > > requirement so I will point to the precise part of the documentation > > that explains it: > > > > https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html#open-source-userspace-requirements > > > > For an example of a submissions that complies, see: > > > > https://lore.kernel.org/dri-devel/[email protected]/ > > > > Most importantly, notice how the proposed Thames Mesa driver generates > > machine code for all the hardware units, and doesn't use any blob for > > that. > > > I believe QDA checks all boxes for accel, as there is available > opensource userspace, opensource QAIC compiler for IDL compilation and > LLVM supports hexagon arch. I must say that I'm at a total loss regarding the userspace portion of this driver, despite spending half an hour looking inside the FastRPC branch that you link from the cover letter. Can you please explain what do people need to do to: - run an algorithm of their choice on the DSP, - execute inference with a common ML framework such as TensorFlow Lite or PyTorch. The documentation I pointed to earlier explains in length what is expected from the userspace portion of the driver, which is more than just being open source. Thanks, Tomeu > I'll try adding these details as well. > > Thanks!> Regards, > > > > Tomeu > > > >>> + > >>> +Internal Implementation > >>> +======================= > >>> + > >>> +Memory Management > >>> +----------------- > >>> +The driver's memory manager creates virtual "IOMMU devices" that map to > >>> +hardware context banks. This allows the driver to manage multiple > >>> isolated > >>> +address spaces. The implementation uses a DMA-coherent backend to ensure > >>> data consistency > >>> +between the CPU and DSP without manual cache maintenance in most cases. > >> > >> GEM usage? > >> > >>> + > >>> +Debugging > >>> +========= > >>> +The driver includes extensive dynamic debug support. Enable it via the > >>> +kernel's dynamic debug control: > >>> + > >>> +.. code-block:: bash > >>> + > >>> + echo "file drivers/accel/qda/* +p" > > >>> /sys/kernel/debug/dynamic_debug/control > >>> + > >>> +Testing > >>> +======= > >>> +The QDA driver can be exercised using the ``fastrpc_test`` utility from > >>> the > >>> +FastRPC userspace library. Run the test application: > >> > >> pointer > >> > >>> + > >>> +.. code-block:: bash > >>> + > >>> + fastrpc_test -d 3 -U 1 -t linux -a v68 > >>> + > >>> +**Options** > >>> + > >>> +``-d domain`` > >>> + Select the DSP domain to run on: > >>> + > >>> + * ``0`` — ADSP > >>> + * ``1`` — MDSP > >>> + * ``2`` — SDSP > >>> + * ``3`` — CDSP *(default on targets with CDSP)* > >>> + > >>> +``-U unsigned_PD`` > >>> + Select signed or unsigned protection domain: > >>> + > >>> + * ``0`` — signed PD > >>> + * ``1`` — unsigned PD *(default)* > >>> + > >>> +``-t target`` > >>> + Target platform: ``android`` or ``linux`` *(default: linux)* > >>> + > >>> +``-a arch_version`` > >>> + DSP architecture version, e.g. ``v68``, ``v75`` *(default: v68)* > >>> > >>> -- > >>> 2.34.1 > >>> > >>> > >> > >> -- > >> With best wishes > >> Dmitry >
