400Ping opened a new pull request, #1158: URL: https://github.com/apache/mahout/pull/1158
## Summary This PR adds and hardens AMD GPU support through the `triton_amd` backend (ROCm + Triton), while keeping a unified encoding contract across backends. Primary goal: reduce NVIDIA vendor lock-in by enabling a practical AMD production path in QDP. ## Motivation Today, CUDA is the dominant path, which increases platform risk and procurement constraints. This PR introduces an AMD route with consistent API behavior so users can run QDP encodings on AMD GPUs without changing upper-layer logic. ## What Changed ### 1. Triton AMD backend (`triton_amd`) Implemented AMD backend support for: - `amplitude` - `angle` - `basis` ### 2. Precision correctness fixes Fixed precision contract for Triton AMD: - `precision="float64"` now uses actual `float64` compute paths for angle/basis kernels - output dtype aligns with requested precision (`complex128` for float64) ### 3. Unified output contract Aligned direct Triton engine usage with router contract: - outputs are DLPack-compatible unified objects (`QuantumTensor` wrapper path), not ad-hoc tensor-only behavior - keeps upper-layer code consistent across backend choices ### 4. Backend routing improvements Enhanced `auto` routing robustness: - improved ROCm environment detection using Linux runtime signals (not env vars only) - avoids accidental CUDA-first behavior in ROCm-capable environments - routing remains explicit and predictable (`auto`, `triton_amd`, `cuda`) ### 5. Tests and CI Added/updated tests for: - correctness parity against torch references (`amplitude/angle/basis`) - float64 precision contract - unified router contract behavior Added ROCm-focused CI workflow to run ROCm-marked tests on ROCm runner infrastructure. ### 6. Documentation updates Updated docs to clarify backend capability boundaries: - Triton AMD supports `amplitude/angle/basis` - `iqp` is **not** currently supported by Triton AMD - usage/setup instructions for ROCm + Triton and auto routing behavior ## Files (high-level) - `qdp/qdp-python/qumat_qdp/triton_amd.py` - `qdp/qdp-python/qumat_qdp/backend.py` - `qdp/qdp-python/tests/test_triton_amd_backend.py` - `qdp/qdp-python/tests/test_backend_routing.py` - `.github/workflows/qdp-python-rocm-testing.yml` - `qdp/qdp-python/TRITON_AMD_BACKEND.md` - `qdp/qdp-python/README.md` ## Backward Compatibility - Existing CUDA path remains unchanged. - Public routing API remains the same (`create_encoder_engine(...)`). - No IQP behavior changes in this PR (Triton AMD still excludes IQP). ## Follow-ups (separate PRs) - Native HIP kernel path (if needed for deeper optimization) - TPU path via JAX/Pallas (already planned in separate issue) ### Related Issues Closes #1155 ### Changes - [ ] Bug fix - [x] New feature - [x] Refactoring - [ ] Documentation - [x] Test - [x] CI/CD pipeline - [ ] Other ## Checklist - [x] Added or updated unit tests for all changes - [x] Added or updated documentation for all changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
