gemini-code-assist[bot] commented on code in PR #36437:
URL: https://github.com/apache/beam/pull/36437#discussion_r2413899446


##########
sdks/python/apache_beam/testing/benchmarks/inference/README.md:
##########
@@ -100,4 +102,95 @@ Approximate size of the models used in the tests
 * bert-base-uncased: 417.7 MB
 * bert-large-uncased: 1.2 GB
 
-All the performance tests are defined at 
[job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
+## PyTorch Sentiment Analysis DistilBERT base
+
+**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
+**Accelerator**: CPU only
+**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
+
+## VLLM Gemma 2b Batch Performance on Tesla T4
+
+**Model**: google/gemma-2b-it
+**Accelerator**: NVIDIA Tesla T4 GPU
+**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
+
+## How to add a new ML benchmark pipeline
+
+1. Create the pipeline implementation
+
+- Location: sdks/python/apache_beam/examples/inference (e.g., 
pytorch_sentiment.py)
+- Define CLI args and the logic
+- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, 
--metrics_table).
+
+2. Create the benchmark implementation
+
+- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., 
pytorch_sentiment_benchmarks.py)
+- Inherit from DataflowCostBenchmark  class.
+- Ensure the expected 'pcollection' parameter is passed to your builder. This 
parameter could be obtained from GCP Dataflow Jobs -> Your Job Page.
+- Keep naming consistent with other benchmarks.
+
+3. Add an options txt file
+
+- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt
+- Include Dataflow and pipeline flags. Example:
+
+```
+--region=us-central1
+--machine_type=n1-standard-2
+--num_workers=75
+--disk_size_gb=50
+--autoscaling_algorithm=NONE
+--staging_location=gs://temp-storage-for-perf-tests/loadtests
+--temp_location=gs://temp-storage-for-perf-tests/loadtests
+--requirements_file=apache_beam/ml/inference/your-requirements-file.txt

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The example path for `--requirements_file` appears to be missing the 
`sdks/python/` prefix. Assuming the benchmark is run from the repository root, 
this path would be incorrect and cause the test to fail. Adding the full path 
from the repository root will prevent potential confusion and errors.
   
   ```suggestion
   
--requirements_file=sdks/python/apache_beam/ml/inference/your-requirements-file.txt
   ```



##########
sdks/python/apache_beam/testing/benchmarks/inference/README.md:
##########
@@ -100,4 +102,95 @@ Approximate size of the models used in the tests
 * bert-base-uncased: 417.7 MB
 * bert-large-uncased: 1.2 GB
 
-All the performance tests are defined at 
[job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
+## PyTorch Sentiment Analysis DistilBERT base
+
+**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
+**Accelerator**: CPU only
+**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
+
+## VLLM Gemma 2b Batch Performance on Tesla T4
+
+**Model**: google/gemma-2b-it
+**Accelerator**: NVIDIA Tesla T4 GPU
+**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
+
+## How to add a new ML benchmark pipeline
+
+1. Create the pipeline implementation
+
+- Location: sdks/python/apache_beam/examples/inference (e.g., 
pytorch_sentiment.py)
+- Define CLI args and the logic
+- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, 
--metrics_table).
+
+2. Create the benchmark implementation
+
+- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., 
pytorch_sentiment_benchmarks.py)
+- Inherit from DataflowCostBenchmark  class.

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   There's a minor typo with an extra space after `DataflowCostBenchmark`. 
Removing it will improve the document's polish.
   
   ```suggestion
   - Inherit from DataflowCostBenchmark class.
   ```



##########
sdks/python/apache_beam/testing/benchmarks/inference/README.md:
##########
@@ -100,4 +102,95 @@ Approximate size of the models used in the tests
 * bert-base-uncased: 417.7 MB
 * bert-large-uncased: 1.2 GB
 
-All the performance tests are defined at 
[job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
+## PyTorch Sentiment Analysis DistilBERT base
+
+**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
+**Accelerator**: CPU only
+**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
+
+## VLLM Gemma 2b Batch Performance on Tesla T4
+
+**Model**: google/gemma-2b-it
+**Accelerator**: NVIDIA Tesla T4 GPU
+**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
+
+## How to add a new ML benchmark pipeline
+
+1. Create the pipeline implementation
+
+- Location: sdks/python/apache_beam/examples/inference (e.g., 
pytorch_sentiment.py)
+- Define CLI args and the logic
+- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, 
--metrics_table).
+
+2. Create the benchmark implementation
+
+- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., 
pytorch_sentiment_benchmarks.py)
+- Inherit from DataflowCostBenchmark  class.
+- Ensure the expected 'pcollection' parameter is passed to your builder. This 
parameter could be obtained from GCP Dataflow Jobs -> Your Job Page.
+- Keep naming consistent with other benchmarks.
+
+3. Add an options txt file
+
+- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt
+- Include Dataflow and pipeline flags. Example:
+
+```
+--region=us-central1
+--machine_type=n1-standard-2
+--num_workers=75
+--disk_size_gb=50
+--autoscaling_algorithm=NONE
+--staging_location=gs://temp-storage-for-perf-tests/loadtests
+--temp_location=gs://temp-storage-for-perf-tests/loadtests
+--requirements_file=apache_beam/ml/inference/your-requirements-file.txt
+--publish_to_big_query=true
+--metrics_dataset=beam_run_inference
+--metrics_table=your_table
+--input_options={}
+--influx_measurement=your-measurement
+--device=CPU
+--runner=DataflowRunner
+```
+
+4. Wire it into the GitHub Action
+
+- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml
+- Add your argument-file-path to the matrix.
+- Add a step that runs your <pipeline_name>_benchmarks.py with 
-PloadTest.args=$YOUR_ARGUMENTS.
+
+5. Test on your fork
+
+- Trigger the workflow manually.
+- Confirm the Dataflow job completes successfully.
+
+6. Verify metrics in BigQuery
+
+- Dataset: beam_run_inference. Table: your_table
+- Confirm new rows for your pipeline_name with recent timestamps.
+
+7. Update the website
+
+- Create: website/www/site/content/en/performance/<pipeline_name>/_index.md 
(short title/description).
+- Update: website/www/site/data/performance.yaml — add your pipeline and five 
chart entries with:
+- - looker_folder_id
+- - public_slug_id (from Looker, see below)
+
+8. Create Looker content (5 charts)
+
+- In Looker → Shared folders → run_inference: create a subfolder for your 
pipeline.
+- From an existing chart: Development mode → Explore from here → Go to LookML.
+- Point to your table/view and create 5 standard charts 
(latency/throughput/cost/etc.).
+- Save changes → Publish to production.
+- From Explore, open each, set fields/filters for your pipeline, Run, then 
Save as Look (in your folder).
+- Open each Look:
+- - Copy Look ID
+- - Add Look IDs to .test-infra/tools/refresh_looker_metrics.py.
+- - Exit Development mode → Edit Settings → Allow public access.
+- - Copy public_slug_id and paste into website/performance.yml.
+- - Run .test-infra/tools/refresh_looker_metrics.py script or manually 
download as PNG via the public slug and upload to GCS: 
gs://public_looker_explores_us_a3853f40/FOLDER_ID/<look_slug>.png

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The markdown for this nested list should use indentation for better 
rendering. Additionally, the last instruction is a bit dense. It can be 
clarified by separating the automated and manual steps and explaining the 
`FOLDER_ID` placeholder.
   
   This suggestion improves both formatting and clarity.
   
   ```suggestion
     - Copy Look ID
     - Add Look IDs to .test-infra/tools/refresh_looker_metrics.py.
     - Exit Development mode → Edit Settings → Allow public access.
     - Copy public_slug_id and paste into website/performance.yml.
     - Run `.test-infra/tools/refresh_looker_metrics.py` script. As a manual 
alternative, download as PNG via the public slug and upload to GCS: 
`gs://public_looker_explores_us_a3853f40/FOLDER_ID/<look_slug>.png` (where 
`FOLDER_ID` is from step 7).
   ```



##########
sdks/python/apache_beam/testing/benchmarks/inference/README.md:
##########
@@ -100,4 +102,95 @@ Approximate size of the models used in the tests
 * bert-base-uncased: 417.7 MB
 * bert-large-uncased: 1.2 GB
 
-All the performance tests are defined at 
[job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
+## PyTorch Sentiment Analysis DistilBERT base
+
+**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
+**Accelerator**: CPU only
+**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
+
+## VLLM Gemma 2b Batch Performance on Tesla T4
+
+**Model**: google/gemma-2b-it
+**Accelerator**: NVIDIA Tesla T4 GPU
+**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
+
+## How to add a new ML benchmark pipeline
+
+1. Create the pipeline implementation
+
+- Location: sdks/python/apache_beam/examples/inference (e.g., 
pytorch_sentiment.py)
+- Define CLI args and the logic
+- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, 
--metrics_table).
+
+2. Create the benchmark implementation
+
+- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., 
pytorch_sentiment_benchmarks.py)
+- Inherit from DataflowCostBenchmark  class.
+- Ensure the expected 'pcollection' parameter is passed to your builder. This 
parameter could be obtained from GCP Dataflow Jobs -> Your Job Page.
+- Keep naming consistent with other benchmarks.
+
+3. Add an options txt file
+
+- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt
+- Include Dataflow and pipeline flags. Example:
+
+```
+--region=us-central1
+--machine_type=n1-standard-2
+--num_workers=75
+--disk_size_gb=50
+--autoscaling_algorithm=NONE
+--staging_location=gs://temp-storage-for-perf-tests/loadtests
+--temp_location=gs://temp-storage-for-perf-tests/loadtests
+--requirements_file=apache_beam/ml/inference/your-requirements-file.txt
+--publish_to_big_query=true
+--metrics_dataset=beam_run_inference
+--metrics_table=your_table
+--input_options={}
+--influx_measurement=your-measurement
+--device=CPU
+--runner=DataflowRunner
+```
+
+4. Wire it into the GitHub Action
+
+- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml
+- Add your argument-file-path to the matrix.
+- Add a step that runs your <pipeline_name>_benchmarks.py with 
-PloadTest.args=$YOUR_ARGUMENTS.

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The placeholder `$YOUR_ARGUMENTS` is ambiguous. It would be helpful to 
clarify that this should be the content of the options file created in step 3. 
You might want to suggest a way to pass the file content, for example, using 
`$(cat <file_path>)` in a shell environment.



##########
sdks/python/apache_beam/testing/benchmarks/inference/README.md:
##########
@@ -100,4 +102,95 @@ Approximate size of the models used in the tests
 * bert-base-uncased: 417.7 MB
 * bert-large-uncased: 1.2 GB
 
-All the performance tests are defined at 
[job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
+## PyTorch Sentiment Analysis DistilBERT base
+
+**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
+**Accelerator**: CPU only
+**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
+
+## VLLM Gemma 2b Batch Performance on Tesla T4
+
+**Model**: google/gemma-2b-it
+**Accelerator**: NVIDIA Tesla T4 GPU
+**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
+
+## How to add a new ML benchmark pipeline
+
+1. Create the pipeline implementation
+
+- Location: sdks/python/apache_beam/examples/inference (e.g., 
pytorch_sentiment.py)
+- Define CLI args and the logic
+- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, 
--metrics_table).
+
+2. Create the benchmark implementation
+
+- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., 
pytorch_sentiment_benchmarks.py)
+- Inherit from DataflowCostBenchmark  class.
+- Ensure the expected 'pcollection' parameter is passed to your builder. This 
parameter could be obtained from GCP Dataflow Jobs -> Your Job Page.
+- Keep naming consistent with other benchmarks.
+
+3. Add an options txt file
+
+- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt
+- Include Dataflow and pipeline flags. Example:
+
+```
+--region=us-central1
+--machine_type=n1-standard-2
+--num_workers=75
+--disk_size_gb=50
+--autoscaling_algorithm=NONE
+--staging_location=gs://temp-storage-for-perf-tests/loadtests
+--temp_location=gs://temp-storage-for-perf-tests/loadtests
+--requirements_file=apache_beam/ml/inference/your-requirements-file.txt
+--publish_to_big_query=true
+--metrics_dataset=beam_run_inference
+--metrics_table=your_table
+--input_options={}
+--influx_measurement=your-measurement
+--device=CPU
+--runner=DataflowRunner
+```
+
+4. Wire it into the GitHub Action
+
+- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml
+- Add your argument-file-path to the matrix.
+- Add a step that runs your <pipeline_name>_benchmarks.py with 
-PloadTest.args=$YOUR_ARGUMENTS.
+
+5. Test on your fork
+
+- Trigger the workflow manually.
+- Confirm the Dataflow job completes successfully.
+
+6. Verify metrics in BigQuery
+
+- Dataset: beam_run_inference. Table: your_table
+- Confirm new rows for your pipeline_name with recent timestamps.
+
+7. Update the website
+
+- Create: website/www/site/content/en/performance/<pipeline_name>/_index.md 
(short title/description).
+- Update: website/www/site/data/performance.yaml — add your pipeline and five 
chart entries with:
+- - looker_folder_id
+- - public_slug_id (from Looker, see below)

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The markdown for this nested list uses `- -`, which may not render correctly 
in all markdown viewers. Using indentation for nested lists is more standard 
and ensures consistent rendering.
   
   ```suggestion
     - looker_folder_id
     - public_slug_id (from Looker, see below)
   ```



##########
sdks/python/apache_beam/testing/benchmarks/inference/README.md:
##########
@@ -100,4 +102,95 @@ Approximate size of the models used in the tests
 * bert-base-uncased: 417.7 MB
 * bert-large-uncased: 1.2 GB
 
-All the performance tests are defined at 
[job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
+## PyTorch Sentiment Analysis DistilBERT base
+
+**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
+**Accelerator**: CPU only
+**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
+
+## VLLM Gemma 2b Batch Performance on Tesla T4
+
+**Model**: google/gemma-2b-it
+**Accelerator**: NVIDIA Tesla T4 GPU
+**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
+
+## How to add a new ML benchmark pipeline
+
+1. Create the pipeline implementation
+
+- Location: sdks/python/apache_beam/examples/inference (e.g., 
pytorch_sentiment.py)
+- Define CLI args and the logic
+- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, 
--metrics_table).
+
+2. Create the benchmark implementation
+
+- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., 
pytorch_sentiment_benchmarks.py)
+- Inherit from DataflowCostBenchmark  class.
+- Ensure the expected 'pcollection' parameter is passed to your builder. This 
parameter could be obtained from GCP Dataflow Jobs -> Your Job Page.

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The terms 'builder' and the method for obtaining the 'pcollection' parameter 
are a bit vague. Clarifying these will help contributors understand what's 
needed more easily.
   
   The 'builder' seems to be the `DataflowCostBenchmark` constructor, and the 
`pcollection` is the name of a `PCollection` in the pipeline graph used for 
metrics.
   
   ```suggestion
   - Ensure the 'pcollection' parameter is passed to the 
`DataflowCostBenchmark` constructor. This is the name of the PCollection for 
which to measure throughput, and you can find this name in the Dataflow UI job 
graph.
   ```



##########
sdks/python/apache_beam/testing/benchmarks/inference/README.md:
##########
@@ -100,4 +102,95 @@ Approximate size of the models used in the tests
 * bert-base-uncased: 417.7 MB
 * bert-large-uncased: 1.2 GB
 
-All the performance tests are defined at 
[job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
+## PyTorch Sentiment Analysis DistilBERT base
+
+**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
+**Accelerator**: CPU only
+**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
+
+## VLLM Gemma 2b Batch Performance on Tesla T4
+
+**Model**: google/gemma-2b-it
+**Accelerator**: NVIDIA Tesla T4 GPU
+**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
+
+Full pipeline implementation is available 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
+
+## How to add a new ML benchmark pipeline
+
+1. Create the pipeline implementation
+
+- Location: sdks/python/apache_beam/examples/inference (e.g., 
pytorch_sentiment.py)
+- Define CLI args and the logic
+- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, 
--metrics_table).
+
+2. Create the benchmark implementation
+
+- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., 
pytorch_sentiment_benchmarks.py)
+- Inherit from DataflowCostBenchmark  class.
+- Ensure the expected 'pcollection' parameter is passed to your builder. This 
parameter could be obtained from GCP Dataflow Jobs -> Your Job Page.
+- Keep naming consistent with other benchmarks.
+
+3. Add an options txt file
+
+- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt
+- Include Dataflow and pipeline flags. Example:
+
+```
+--region=us-central1
+--machine_type=n1-standard-2
+--num_workers=75
+--disk_size_gb=50
+--autoscaling_algorithm=NONE
+--staging_location=gs://temp-storage-for-perf-tests/loadtests
+--temp_location=gs://temp-storage-for-perf-tests/loadtests
+--requirements_file=apache_beam/ml/inference/your-requirements-file.txt
+--publish_to_big_query=true
+--metrics_dataset=beam_run_inference
+--metrics_table=your_table
+--input_options={}

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The `--input_options={}` flag is included in the example, but its purpose 
and expected values are not explained. Please add a brief description to 
clarify what this option is used for and provide an example of a valid value if 
possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to