Re: [PR] [QDP] Update benchmark_throughput to batch encoding [mahout]

via GitHub Mon, 05 Jan 2026 04:28:48 -0800


400Ping commented on PR #796:
URL: https://github.com/apache/mahout/pull/796#issuecomment-3710225556


   Before(dev-qdp):
   ```
   $ python ./qdp-python/benchmark/benchmark_throughput.py
   Generating 12800 samples of 16 qubits...
     Batch size   : 64
     Vector length: 65536
     Batches      : 200
     Prefetch     : 16
     Frameworks   : pennylane, qiskit, mahout
     Generated 12800 samples
     PennyLane/Qiskit format: 6400.00 MB
     Mahout format: 6400.00 MB
   
   ======================================================================
   DATALOADER THROUGHPUT BENCHMARK: 16 Qubits, 12800 Samples
   ======================================================================
   
   [PennyLane] Full Pipeline (DataLoader -> GPU)...
   
/home/jay/work/mahout/qdp/./qdp-python/benchmark/benchmark_throughput.py:170: 
UserWarning: Casting complex values to real discards the imaginary part 
(Triggered internally at /pytorch/aten/src/ATen/native/Copy.cpp:309.)
     state_gpu = state_cpu.to("cuda", dtype=torch.float32)
     Total Time: 6.7562 s (1894.6 vectors/sec)
   
   [Qiskit] Full Pipeline (DataLoader -> GPU)...
   
           
     Total Time: 848.2128 s (15.1 vectors/sec)
   
   [Mahout] Full Pipeline (DataLoader -> GPU)...
     IO + Encode Time: 9.7979 s
     Total Time: 9.7979 s (1306.4 vectors/sec)
   
   ======================================================================
   THROUGHPUT (Higher is Better)
   Samples: 12800, Qubits: 16
   ======================================================================
   PennyLane        1894.6 vectors/sec
   Mahout           1306.4 vectors/sec
   Qiskit             15.1 vectors/sec
   ----------------------------------------------------------------------
   Speedup vs PennyLane:       0.69x
   Speedup vs Qiskit:         86.57x
   ```
   
   After:
   ```
   $ python ./qdp-python/benchmark/benchmark_throughput.py
   Generating 12800 samples of 16 qubits...
     Batch size   : 64
     Vector length: 65536
     Batches      : 200
     Prefetch     : 16
     Frameworks   : pennylane, qiskit, mahout
     Generated 12800 samples
     PennyLane/Qiskit format: 6400.00 MB
     Mahout format: 6400.00 MB
   
   ======================================================================
   DATALOADER THROUGHPUT BENCHMARK: 16 Qubits, 12800 Samples
   ======================================================================
   
   [PennyLane] Full Pipeline (DataLoader -> GPU)...
   
/home/jay/work/mahout/qdp/./qdp-python/benchmark/benchmark_throughput.py:169: 
UserWarning: Casting complex values to real discards the imaginary part 
(Triggered internally at /pytorch/aten/src/ATen/native/Copy.cpp:309.)
     state_gpu = state_cpu.to("cuda", dtype=torch.float32)
     Total Time: 6.7298 s (1902.0 vectors/sec)
   
   [Qiskit] Full Pipeline (DataLoader -> GPU)...
   
     Total Time: 854.9839 s (15.0 vectors/sec)
   
   [Mahout] Full Pipeline (DataLoader -> GPU)...
     IO + Encode Time: 3.6776 s
     Total Time: 3.6776 s (3480.5 vectors/sec)
   
   ======================================================================
   THROUGHPUT (Higher is Better)
   Samples: 12800, Qubits: 16
   ======================================================================
   Mahout           3480.5 vectors/sec
   PennyLane        1902.0 vectors/sec
   Qiskit             15.0 vectors/sec
   ----------------------------------------------------------------------
   Speedup vs PennyLane:       1.83x
   Speedup vs Qiskit:        232.48x
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Update benchmark_throughput to batch encoding [mahout]

Reply via email to