Re: [PR] [QDP] Baseline and Data Collection [mahout]

via GitHub Thu, 29 Jan 2026 01:47:23 -0800


rich7420 commented on PR #972:
URL: https://github.com/apache/mahout/pull/972#issuecomment-3816575134


   ```
   cd /home/rich-wsl/mahout/qdp/qdp-python/benchmark
   export QDP_ENABLE_POOL_METRICS=1
   export QDP_ENABLE_OVERLAP_TRACKING=1
   export RUST_LOG=info
   uv run python run_pipeline_baseline.py --qubits 16 --batch-size 64 
--prefetch 16 --batches 500 --trials 20
   ```
   
   # baseline report
   
   - **Date**: 2026-01-29
   - **Git commit**: ef00f92eb236
   - **GPU**: NVIDIA GeForce RTX 3080
   - **Driver**: 560.94
   - **CUDA**: 12.1
   
   ## Parameters
   
   - qubits: 16
   - batch_size: 64
   - prefetch: 16
   - batches: 200
   - trials: 5
   - encoding: amplitude
   
   ## Results
   
   | Metric | Median | P95 |
   |--------|--------|-----|
   | Throughput (vectors/sec) | 1454.2 | 1742.3 |
   | Latency (ms/vector) | 0.720 | 0.731 |
   
   
   ---
   
   ```
   cd /home/rich-wsl/mahout/qdp/qdp-python && nsys profile --trace=cuda,nvtx 
--output=../docs/optimization/results/baseline_before_uv uv run python 
benchmark/benchmark_throughput.py --qubits 16 --batches 200 --batch-size 64 
--prefetch 16 --frameworks mahout
   
   cd /home/rich-wsl/mahout/qdp && nsys stats 
docs/optimization/results/baseline_before_uv.sqlite 
   
   ``` ** CUDA API Summary (cuda_api_sum):
   
    Time (%)  Total Time (ns)  Num Calls  Avg (ns)   Med (ns)   Min (ns)  Max 
(ns)  StdDev (ns)              Name            
    --------  ---------------  ---------  ---------  ---------  --------  
--------  -----------  ----------------------------
        60.3       1154513612        200  5772568.1  5600405.0   4603514   
8527926     818264.7  cuMemcpyHtoDAsync_v2        
        14.1        269678972        800   337098.7   160114.5      1525   
2679828     573425.8  cuStreamSynchronize         
        12.7        243282018        200  1216410.1  1035882.0     76003   
2659132     499445.4  cuMemcpyDtoHAsync_v2        
         4.8         91270891        800   114088.6     7550.5       465   
7939629     329877.7  cuMemAllocAsync             
         3.3         63990841       1200    53325.7    23571.5      6390  
17779574     652454.9  cudaLaunchKernel            
         3.0         56531757        400   141329.4   135325.0     26292    
422046      65129.1  cudaMemGetInfo              
         0.4          8475510        400    21188.8    12548.0      7223    
100009      16110.7  cudaMemsetAsync             
         0.4          6802300        200    34011.5    32870.5     17636    
213820      17152.8  cuLaunchKernel              
         0.3          6583789        200    32918.9    24025.5     14132    
144830      18166.2  cuMemsetD8Async             
         0.2          4736816       3002     1577.9      863.5       145     
67826       2298.3  cuCtxSetCurrent             
         0.2          3457162          2  1728581.0  1728581.0      8156   
3449006    2433048.4  cudaDeviceSynchronize       
         0.1          2554445        800     3193.1     2796.0       970     
47088       3201.1  cuMemFreeAsync              
         0.1          1376489          4   344122.3   346245.0    301121    
382878      37481.3  cudaMalloc                  
         0.0           174142          1   174142.0   174142.0    174142    
174142          0.0  cuModuleLoadData            
         0.0            42998        383      112.3       90.0        52       
708         73.1  cuGetProcAddress_v2         
         0.0            23813          7     3401.9     2823.0       338     
11571       3797.4  cudaStreamIsCapturing_v10000
         0.0             1173          1     1173.0     1173.0      1173      
1173          0.0  cuEventCreate               
         0.0             1064          1     1064.0     1064.0      1064      
1064          0.0  cuInit                      
         0.0              599          1      599.0      599.0       599       
599          0.0  cuEventDestroy_v2           
         0.0               91          1       91.0       91.0        91        
91          0.0  cuModuleGetLoadingMode   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Baseline and Data Collection [mahout]

Reply via email to