[PR] [Feature][QDP] Add AMD GPU support via Triton backend [mahout]

via GitHub Sun, 08 Mar 2026 09:23:37 -0700


400Ping opened a new pull request, #1158:
URL: https://github.com/apache/mahout/pull/1158


   ## Summary
   
   This PR adds and hardens AMD GPU support through the `triton_amd` backend 
(ROCm + Triton), while keeping a unified encoding contract across backends.
   
   Primary goal: reduce NVIDIA vendor lock-in by enabling a practical AMD 
production path in QDP.
   
   ## Motivation
   
   Today, CUDA is the dominant path, which increases platform risk and 
procurement constraints.
   This PR introduces an AMD route with consistent API behavior so users can 
run QDP encodings on AMD GPUs without changing upper-layer logic.
   
   ## What Changed
   
   ### 1. Triton AMD backend (`triton_amd`)
   Implemented AMD backend support for:
   - `amplitude`
   - `angle`
   - `basis`
   
   ### 2. Precision correctness fixes
   Fixed precision contract for Triton AMD:
   - `precision="float64"` now uses actual `float64` compute paths for 
angle/basis kernels
   - output dtype aligns with requested precision (`complex128` for float64)
   
   ### 3. Unified output contract
   Aligned direct Triton engine usage with router contract:
   - outputs are DLPack-compatible unified objects (`QuantumTensor` wrapper 
path), not ad-hoc tensor-only behavior
   - keeps upper-layer code consistent across backend choices
   
   ### 4. Backend routing improvements
   Enhanced `auto` routing robustness:
   - improved ROCm environment detection using Linux runtime signals (not env 
vars only)
   - avoids accidental CUDA-first behavior in ROCm-capable environments
   - routing remains explicit and predictable (`auto`, `triton_amd`, `cuda`)
   
   ### 5. Tests and CI
   Added/updated tests for:
   - correctness parity against torch references (`amplitude/angle/basis`)
   - float64 precision contract
   - unified router contract behavior
   
   Added ROCm-focused CI workflow to run ROCm-marked tests on ROCm runner 
infrastructure.
   
   ### 6. Documentation updates
   Updated docs to clarify backend capability boundaries:
   - Triton AMD supports `amplitude/angle/basis`
   - `iqp` is **not** currently supported by Triton AMD
   - usage/setup instructions for ROCm + Triton and auto routing behavior
   
   ## Files (high-level)
   
   - `qdp/qdp-python/qumat_qdp/triton_amd.py`
   - `qdp/qdp-python/qumat_qdp/backend.py`
   - `qdp/qdp-python/tests/test_triton_amd_backend.py`
   - `qdp/qdp-python/tests/test_backend_routing.py`
   - `.github/workflows/qdp-python-rocm-testing.yml`
   - `qdp/qdp-python/TRITON_AMD_BACKEND.md`
   - `qdp/qdp-python/README.md`
   
   ## Backward Compatibility
   
   - Existing CUDA path remains unchanged.
   - Public routing API remains the same (`create_encoder_engine(...)`).
   - No IQP behavior changes in this PR (Triton AMD still excludes IQP).
   
   ## Follow-ups (separate PRs)
   
   - Native HIP kernel path (if needed for deeper optimization)
   - TPU path via JAX/Pallas (already planned in separate issue)
   
   ### Related Issues
   
   Closes #1155 
   
   ### Changes
   
   - [ ] Bug fix
   - [x] New feature
   - [x] Refactoring
   - [ ] Documentation
   - [x] Test
   - [x] CI/CD pipeline
   - [ ] Other
   
   ## Checklist
   
   - [x] Added or updated unit tests for all changes
   - [x] Added or updated documentation for all changes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [Feature][QDP] Add AMD GPU support via Triton backend [mahout]

Reply via email to