This is an automated email from the ASF dual-hosted git repository.

guanmingchiu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/mahout.git

commit d317c34c13232c0444edc9f1692950cedeec705b
Author: Ping <[email protected]>
AuthorDate: Wed Dec 31 22:35:17 2025 +0800

    [QDP] Rename benchmark throughput script and refresh docs (#776)
    
    Signed-off-by: 400Ping <[email protected]>
---
 qdp/DEVELOPMENT.md                                 |  4 +-
 qdp/Makefile                                       |  2 +-
 qdp/qdp-python/benchmark/README.md                 | 83 +++++++++++++++++++++-
 .../qdp-python/benchmark/benchmark_throughput.md   | 10 ++-
 ...oader_throughput.py => benchmark_throughput.py} |  2 +-
 5 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/qdp/DEVELOPMENT.md b/qdp/DEVELOPMENT.md
index e8664802e..d28a3ffcf 100644
--- a/qdp/DEVELOPMENT.md
+++ b/qdp/DEVELOPMENT.md
@@ -168,7 +168,7 @@ You can also run individual tests manually from the 
`qdp-python/benchmark/` dire
 
 ```sh
 # Benchmark test for dataloader throughput
-python benchmark_dataloader_throughput.py
+python benchmark_throughput.py
 
 # E2E test
 python benchmark_e2e.py
@@ -194,7 +194,7 @@ A: Check available GPUs with `nvidia-smi`. Verify GPU 
visibility with `echo $CUD
 
 ### Q: Benchmark tests fail or produce unexpected results
 
-A: Ensure all dependencies are installed with `uv pip install -r 
benchmark/requirements.txt`. Check GPU memory availability using `nvidia-smi`. 
If you don't need qiskit/pennylane comparisons, uninstall them as mentioned in 
the [E2e test section](#e2e-tests).
+A: Ensure all dependencies are installed with `uv sync --group benchmark` 
(from `qdp/qdp-python`). Check GPU memory availability using `nvidia-smi`. If 
you don't need qiskit/pennylane comparisons, uninstall them as mentioned in the 
[E2e test section](#e2e-tests).
 
 ### Q: Pre-commit hooks fail
 
diff --git a/qdp/Makefile b/qdp/Makefile
index 51f37a551..53572ccf8 100644
--- a/qdp/Makefile
+++ b/qdp/Makefile
@@ -48,7 +48,7 @@ install_benchmark:
 benchmark: install install_benchmark
        @echo "Running e2e benchmark tests..."
        uv run python qdp-python/benchmark/benchmark_e2e.py
-       uv run python qdp-python/benchmark/benchmark_dataloader_throughput.py
+       uv run python qdp-python/benchmark/benchmark_throughput.py
 
 run_nvtx_profile:
        $(eval EXAMPLE ?= nvtx_profile)
diff --git a/qdp/qdp-python/benchmark/README.md 
b/qdp/qdp-python/benchmark/README.md
index f8f413d41..6fcef290e 100644
--- a/qdp/qdp-python/benchmark/README.md
+++ b/qdp/qdp-python/benchmark/README.md
@@ -1,5 +1,86 @@
-<!-- TODO: benchmark docs -->
+# Benchmarks
 
+This directory contains Python benchmarks for Mahout QDP. There are two main
+scripts:
+
+- `benchmark_e2e.py`: end-to-end latency from disk to GPU VRAM (includes IO,
+  normalization, encoding, transfer, and a dummy forward pass).
+- `benchmark_throughput.py`: DataLoader-style throughput benchmark
+  that measures vectors/sec across Mahout, PennyLane, and Qiskit.
+
+## Quick Start
+
+From the repo root:
+
+```bash
+cd qdp
+make benchmark
+```
+
+This installs the QDP Python package (if needed), installs benchmark
+dependencies, and runs both benchmarks.
+
+## Manual Setup
+
+```bash
+cd qdp/qdp-python
+uv sync --group benchmark
+```
+
+Then run benchmarks with `uv run python ...` or activate the virtual
+environment and use `python ...`.
+
+## E2E Benchmark (Disk -> GPU)
+
+```bash
+cd qdp/qdp-python/benchmark
+python benchmark_e2e.py
+```
+
+Additional options:
+
+```bash
+python benchmark_e2e.py --qubits 16 --samples 200 --frameworks mahout-parquet 
mahout-arrow
+python benchmark_e2e.py --frameworks all
+```
+
+Notes:
+
+- `--frameworks` accepts a space-separated list or `all`.
+  Options: `mahout-parquet`, `mahout-arrow`, `pennylane`, `qiskit`.
+- The script writes `final_benchmark_data.parquet` and
+  `final_benchmark_data.arrow` in the current working directory and overwrites
+  them on each run.
+- If multiple frameworks run, the script compares output states for
+  correctness at the end.
+
+## DataLoader Throughput Benchmark
+
+Simulates a typical QML training loop by continuously loading batches of 64
+vectors (default). Goal: demonstrate that QDP can saturate GPU utilization and
+avoid the "starvation" often seen in hybrid training loops.
+
+See `qdp/qdp-python/benchmark/benchmark_throughput.md` for details and example
+output.
+
+```bash
+cd qdp/qdp-python/benchmark
+python benchmark_throughput.py --qubits 16 --batches 200 --batch-size 64 
--prefetch 16
+python benchmark_throughput.py --frameworks mahout,pennylane
+```
+
+Notes:
+
+- `--frameworks` is a comma-separated list or `all`.
+  Options: `mahout`, `pennylane`, `qiskit`.
+- Throughput is reported in vectors/sec (higher is better).
+
+## Dependency Notes
+
+- Qiskit and PennyLane are optional. If they are not installed, their benchmark
+  legs are skipped automatically.
+- For Mahout-only runs, you can uninstall the competitor frameworks:
+  `uv pip uninstall qiskit pennylane`.
 
 ### We can also run benchmarks on colab notebooks(without owning a GPU)
 
diff --git a/docs/benchmarks/dataloader_throughput.md 
b/qdp/qdp-python/benchmark/benchmark_throughput.md
similarity index 86%
rename from docs/benchmarks/dataloader_throughput.md
rename to qdp/qdp-python/benchmark/benchmark_throughput.md
index 242340c26..ba26f0e60 100644
--- a/docs/benchmarks/dataloader_throughput.md
+++ b/qdp/qdp-python/benchmark/benchmark_throughput.md
@@ -2,6 +2,10 @@
 
 This benchmark mirrors the `qdp-core/examples/dataloader_throughput.rs` 
pipeline and compares Mahout (QDP) against PennyLane and Qiskit on the same 
workload. It streams batches from a CPU-side producer, encodes amplitude states 
on GPU, and reports vectors-per-second.
 
+Goal: simulate a typical QML training loop by continuously loading batches of
+64 vectors (default), showing that QDP can keep GPU utilization high and avoid
+the "starvation" often seen in hybrid training loops.
+
 ## Workload
 
 - Qubits: 16 (vector length `2^16`)
@@ -15,11 +19,11 @@ This benchmark mirrors the 
`qdp-core/examples/dataloader_throughput.rs` pipeline
 # QDP-only Rust example
 cargo run -p qdp-core --example dataloader_throughput --release
 
-# Cross-framework comparison (requires deps in qdp/benchmark/requirements.txt)
-python qdp/benchmark/benchmark_dataloader_throughput.py --qubits 16 --batches 
200 --batch-size 64 --prefetch 16
+# Cross-framework comparison (requires benchmark deps)
+python qdp/qdp-python/benchmark/benchmark_throughput.py --qubits 16 --batches 
200 --batch-size 64 --prefetch 16
 
 # Run only Mahout + PennyLane legs
-python qdp/benchmark/benchmark_dataloader_throughput.py --frameworks 
mahout,pennylane
+python qdp/qdp-python/benchmark/benchmark_throughput.py --frameworks 
mahout,pennylane
 ```
 
 ## Example Output
diff --git a/qdp/qdp-python/benchmark/benchmark_dataloader_throughput.py 
b/qdp/qdp-python/benchmark/benchmark_throughput.py
similarity index 99%
rename from qdp/qdp-python/benchmark/benchmark_dataloader_throughput.py
rename to qdp/qdp-python/benchmark/benchmark_throughput.py
index 9ce974084..0d7916fec 100644
--- a/qdp/qdp-python/benchmark/benchmark_dataloader_throughput.py
+++ b/qdp/qdp-python/benchmark/benchmark_throughput.py
@@ -24,7 +24,7 @@ The workload mirrors the 
`qdp-core/examples/dataloader_throughput.rs` pipeline:
 - Encode vectors into amplitude states on GPU and run a tiny consumer op.
 
 Run:
-    python qdp/benchmark/benchmark_dataloader_throughput.py --qubits 16 
--batches 200 --batch-size 64
+    python qdp/benchmark/benchmark_throughput.py --qubits 16 --batches 200 
--batch-size 64
 """
 
 import argparse

Reply via email to