[I] [QDP] [Feature] Multi-GPU Data-Parallel Encoding for Scalable Quantum State Preparation [mahout]

via GitHub Sat, 31 Jan 2026 07:57:33 -0800


400Ping opened a new issue, #1001:
URL: https://github.com/apache/mahout/issues/1001


   ### Summary
   
   Propose adding **multi-GPU data-parallel encoding** to QDP so users can 
scale quantum state preparation across multiple GPUs. Currently, `QdpEngine` 
only supports a single `device_id`, which limits throughput for large batches 
and high qubit counts (e.g., 20+ qubits).
   
   ### Motivation
   
   - **Current limitation**: `QdpEngine::new(device_id: usize)` accepts only 
one GPU 
([qdp-core/src/lib.rs](https://github.com/apache/mahout/blob/main/qdp/qdp-core/src/lib.rs)).
   - **Use case**: High-qubit encoding (20+ qubits) and large batches hit 
single-GPU memory and compute limits. Multi-GPU parallel encoding can increase 
throughput proportionally.
   - **Alignment with PR #1000**: The Quantum Data Loader (PR #1000) provides 
batch-by-batch iteration; multi-GPU support would allow batches to be 
distributed across GPUs for an end-to-end, high-throughput pipeline.
   
   ### Proposed Design
   
   - **Batch routing**: Distribute batches across GPUs (e.g., round-robin or 
workload-aware).
   - **Result aggregation**: Merge outputs from each GPU into a single DLPack 
tensor (or keep a distributed representation for downstream use).
   - **Stream management**: Each GPU uses its own CUDA stream to avoid 
synchronization bottlenecks.
   
   ### Scope
   
   **qdp-core (Rust)**
   - Add a multi-GPU engine abstraction (e.g., `QdpEnginePool`) to manage 
multiple `QdpEngine` instances.
   - Implement `encode_batch_distributed` to split batches across GPUs or 
assign different batches to different GPUs.
   - Use `rayon` or `std::thread` for CPU-side coordination.
   
   **qdp-python**
   - Expose a new API (e.g., `QdpEngine(device_ids=[0, 1, 2])` or 
`MultiGpuEngine`).
   - Integrate with the Quantum Data Loader once PR #1000 is merged.
   - Preserve backward compatibility when a single device is specified.
   
   ### Non-Goals (out of scope)
   
   - Multi-GPU model parallelism or tensor parallelism within a single encoding 
operation.
   - Automatic GPU selection or load balancing in the first version (can be 
added later).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [QDP] [Feature] Multi-GPU Data-Parallel Encoding for Scalable Quantum State Preparation [mahout]

Reply via email to