gemini-code-assist[bot] commented on code in PR #36437: URL: https://github.com/apache/beam/pull/36437#discussion_r2413899446
########## sdks/python/apache_beam/testing/benchmarks/inference/README.md: ########## @@ -100,4 +102,95 @@ Approximate size of the models used in the tests * bert-base-uncased: 417.7 MB * bert-large-uncased: 1.2 GB -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). +## PyTorch Sentiment Analysis DistilBERT base + +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) +**Accelerator**: CPU only +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). + +## VLLM Gemma 2b Batch Performance on Tesla T4 + +**Model**: google/gemma-2b-it +**Accelerator**: NVIDIA Tesla T4 GPU +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). + +## How to add a new ML benchmark pipeline + +1. Create the pipeline implementation + +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) +- Define CLI args and the logic +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). + +2. Create the benchmark implementation + +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) +- Inherit from DataflowCostBenchmark class. +- Ensure the expected 'pcollection' parameter is passed to your builder. This parameter could be obtained from GCP Dataflow Jobs -> Your Job Page. +- Keep naming consistent with other benchmarks. + +3. Add an options txt file + +- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt +- Include Dataflow and pipeline flags. Example: + +``` +--region=us-central1 +--machine_type=n1-standard-2 +--num_workers=75 +--disk_size_gb=50 +--autoscaling_algorithm=NONE +--staging_location=gs://temp-storage-for-perf-tests/loadtests +--temp_location=gs://temp-storage-for-perf-tests/loadtests +--requirements_file=apache_beam/ml/inference/your-requirements-file.txt Review Comment:  The example path for `--requirements_file` appears to be missing the `sdks/python/` prefix. Assuming the benchmark is run from the repository root, this path would be incorrect and cause the test to fail. Adding the full path from the repository root will prevent potential confusion and errors. ```suggestion --requirements_file=sdks/python/apache_beam/ml/inference/your-requirements-file.txt ``` ########## sdks/python/apache_beam/testing/benchmarks/inference/README.md: ########## @@ -100,4 +102,95 @@ Approximate size of the models used in the tests * bert-base-uncased: 417.7 MB * bert-large-uncased: 1.2 GB -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). +## PyTorch Sentiment Analysis DistilBERT base + +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) +**Accelerator**: CPU only +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). + +## VLLM Gemma 2b Batch Performance on Tesla T4 + +**Model**: google/gemma-2b-it +**Accelerator**: NVIDIA Tesla T4 GPU +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). + +## How to add a new ML benchmark pipeline + +1. Create the pipeline implementation + +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) +- Define CLI args and the logic +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). + +2. Create the benchmark implementation + +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) +- Inherit from DataflowCostBenchmark class. Review Comment:  There's a minor typo with an extra space after `DataflowCostBenchmark`. Removing it will improve the document's polish. ```suggestion - Inherit from DataflowCostBenchmark class. ``` ########## sdks/python/apache_beam/testing/benchmarks/inference/README.md: ########## @@ -100,4 +102,95 @@ Approximate size of the models used in the tests * bert-base-uncased: 417.7 MB * bert-large-uncased: 1.2 GB -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). +## PyTorch Sentiment Analysis DistilBERT base + +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) +**Accelerator**: CPU only +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). + +## VLLM Gemma 2b Batch Performance on Tesla T4 + +**Model**: google/gemma-2b-it +**Accelerator**: NVIDIA Tesla T4 GPU +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). + +## How to add a new ML benchmark pipeline + +1. Create the pipeline implementation + +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) +- Define CLI args and the logic +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). + +2. Create the benchmark implementation + +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) +- Inherit from DataflowCostBenchmark class. +- Ensure the expected 'pcollection' parameter is passed to your builder. This parameter could be obtained from GCP Dataflow Jobs -> Your Job Page. +- Keep naming consistent with other benchmarks. + +3. Add an options txt file + +- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt +- Include Dataflow and pipeline flags. Example: + +``` +--region=us-central1 +--machine_type=n1-standard-2 +--num_workers=75 +--disk_size_gb=50 +--autoscaling_algorithm=NONE +--staging_location=gs://temp-storage-for-perf-tests/loadtests +--temp_location=gs://temp-storage-for-perf-tests/loadtests +--requirements_file=apache_beam/ml/inference/your-requirements-file.txt +--publish_to_big_query=true +--metrics_dataset=beam_run_inference +--metrics_table=your_table +--input_options={} +--influx_measurement=your-measurement +--device=CPU +--runner=DataflowRunner +``` + +4. Wire it into the GitHub Action + +- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml +- Add your argument-file-path to the matrix. +- Add a step that runs your <pipeline_name>_benchmarks.py with -PloadTest.args=$YOUR_ARGUMENTS. + +5. Test on your fork + +- Trigger the workflow manually. +- Confirm the Dataflow job completes successfully. + +6. Verify metrics in BigQuery + +- Dataset: beam_run_inference. Table: your_table +- Confirm new rows for your pipeline_name with recent timestamps. + +7. Update the website + +- Create: website/www/site/content/en/performance/<pipeline_name>/_index.md (short title/description). +- Update: website/www/site/data/performance.yaml — add your pipeline and five chart entries with: +- - looker_folder_id +- - public_slug_id (from Looker, see below) + +8. Create Looker content (5 charts) + +- In Looker → Shared folders → run_inference: create a subfolder for your pipeline. +- From an existing chart: Development mode → Explore from here → Go to LookML. +- Point to your table/view and create 5 standard charts (latency/throughput/cost/etc.). +- Save changes → Publish to production. +- From Explore, open each, set fields/filters for your pipeline, Run, then Save as Look (in your folder). +- Open each Look: +- - Copy Look ID +- - Add Look IDs to .test-infra/tools/refresh_looker_metrics.py. +- - Exit Development mode → Edit Settings → Allow public access. +- - Copy public_slug_id and paste into website/performance.yml. +- - Run .test-infra/tools/refresh_looker_metrics.py script or manually download as PNG via the public slug and upload to GCS: gs://public_looker_explores_us_a3853f40/FOLDER_ID/<look_slug>.png Review Comment:  The markdown for this nested list should use indentation for better rendering. Additionally, the last instruction is a bit dense. It can be clarified by separating the automated and manual steps and explaining the `FOLDER_ID` placeholder. This suggestion improves both formatting and clarity. ```suggestion - Copy Look ID - Add Look IDs to .test-infra/tools/refresh_looker_metrics.py. - Exit Development mode → Edit Settings → Allow public access. - Copy public_slug_id and paste into website/performance.yml. - Run `.test-infra/tools/refresh_looker_metrics.py` script. As a manual alternative, download as PNG via the public slug and upload to GCS: `gs://public_looker_explores_us_a3853f40/FOLDER_ID/<look_slug>.png` (where `FOLDER_ID` is from step 7). ``` ########## sdks/python/apache_beam/testing/benchmarks/inference/README.md: ########## @@ -100,4 +102,95 @@ Approximate size of the models used in the tests * bert-base-uncased: 417.7 MB * bert-large-uncased: 1.2 GB -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). +## PyTorch Sentiment Analysis DistilBERT base + +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) +**Accelerator**: CPU only +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). + +## VLLM Gemma 2b Batch Performance on Tesla T4 + +**Model**: google/gemma-2b-it +**Accelerator**: NVIDIA Tesla T4 GPU +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). + +## How to add a new ML benchmark pipeline + +1. Create the pipeline implementation + +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) +- Define CLI args and the logic +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). + +2. Create the benchmark implementation + +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) +- Inherit from DataflowCostBenchmark class. +- Ensure the expected 'pcollection' parameter is passed to your builder. This parameter could be obtained from GCP Dataflow Jobs -> Your Job Page. +- Keep naming consistent with other benchmarks. + +3. Add an options txt file + +- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt +- Include Dataflow and pipeline flags. Example: + +``` +--region=us-central1 +--machine_type=n1-standard-2 +--num_workers=75 +--disk_size_gb=50 +--autoscaling_algorithm=NONE +--staging_location=gs://temp-storage-for-perf-tests/loadtests +--temp_location=gs://temp-storage-for-perf-tests/loadtests +--requirements_file=apache_beam/ml/inference/your-requirements-file.txt +--publish_to_big_query=true +--metrics_dataset=beam_run_inference +--metrics_table=your_table +--input_options={} +--influx_measurement=your-measurement +--device=CPU +--runner=DataflowRunner +``` + +4. Wire it into the GitHub Action + +- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml +- Add your argument-file-path to the matrix. +- Add a step that runs your <pipeline_name>_benchmarks.py with -PloadTest.args=$YOUR_ARGUMENTS. Review Comment:  The placeholder `$YOUR_ARGUMENTS` is ambiguous. It would be helpful to clarify that this should be the content of the options file created in step 3. You might want to suggest a way to pass the file content, for example, using `$(cat <file_path>)` in a shell environment. ########## sdks/python/apache_beam/testing/benchmarks/inference/README.md: ########## @@ -100,4 +102,95 @@ Approximate size of the models used in the tests * bert-base-uncased: 417.7 MB * bert-large-uncased: 1.2 GB -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). +## PyTorch Sentiment Analysis DistilBERT base + +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) +**Accelerator**: CPU only +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). + +## VLLM Gemma 2b Batch Performance on Tesla T4 + +**Model**: google/gemma-2b-it +**Accelerator**: NVIDIA Tesla T4 GPU +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). + +## How to add a new ML benchmark pipeline + +1. Create the pipeline implementation + +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) +- Define CLI args and the logic +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). + +2. Create the benchmark implementation + +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) +- Inherit from DataflowCostBenchmark class. +- Ensure the expected 'pcollection' parameter is passed to your builder. This parameter could be obtained from GCP Dataflow Jobs -> Your Job Page. +- Keep naming consistent with other benchmarks. + +3. Add an options txt file + +- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt +- Include Dataflow and pipeline flags. Example: + +``` +--region=us-central1 +--machine_type=n1-standard-2 +--num_workers=75 +--disk_size_gb=50 +--autoscaling_algorithm=NONE +--staging_location=gs://temp-storage-for-perf-tests/loadtests +--temp_location=gs://temp-storage-for-perf-tests/loadtests +--requirements_file=apache_beam/ml/inference/your-requirements-file.txt +--publish_to_big_query=true +--metrics_dataset=beam_run_inference +--metrics_table=your_table +--input_options={} +--influx_measurement=your-measurement +--device=CPU +--runner=DataflowRunner +``` + +4. Wire it into the GitHub Action + +- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml +- Add your argument-file-path to the matrix. +- Add a step that runs your <pipeline_name>_benchmarks.py with -PloadTest.args=$YOUR_ARGUMENTS. + +5. Test on your fork + +- Trigger the workflow manually. +- Confirm the Dataflow job completes successfully. + +6. Verify metrics in BigQuery + +- Dataset: beam_run_inference. Table: your_table +- Confirm new rows for your pipeline_name with recent timestamps. + +7. Update the website + +- Create: website/www/site/content/en/performance/<pipeline_name>/_index.md (short title/description). +- Update: website/www/site/data/performance.yaml — add your pipeline and five chart entries with: +- - looker_folder_id +- - public_slug_id (from Looker, see below) Review Comment:  The markdown for this nested list uses `- -`, which may not render correctly in all markdown viewers. Using indentation for nested lists is more standard and ensures consistent rendering. ```suggestion - looker_folder_id - public_slug_id (from Looker, see below) ``` ########## sdks/python/apache_beam/testing/benchmarks/inference/README.md: ########## @@ -100,4 +102,95 @@ Approximate size of the models used in the tests * bert-base-uncased: 417.7 MB * bert-large-uncased: 1.2 GB -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). +## PyTorch Sentiment Analysis DistilBERT base + +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) +**Accelerator**: CPU only +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). + +## VLLM Gemma 2b Batch Performance on Tesla T4 + +**Model**: google/gemma-2b-it +**Accelerator**: NVIDIA Tesla T4 GPU +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). + +## How to add a new ML benchmark pipeline + +1. Create the pipeline implementation + +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) +- Define CLI args and the logic +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). + +2. Create the benchmark implementation + +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) +- Inherit from DataflowCostBenchmark class. +- Ensure the expected 'pcollection' parameter is passed to your builder. This parameter could be obtained from GCP Dataflow Jobs -> Your Job Page. Review Comment:  The terms 'builder' and the method for obtaining the 'pcollection' parameter are a bit vague. Clarifying these will help contributors understand what's needed more easily. The 'builder' seems to be the `DataflowCostBenchmark` constructor, and the `pcollection` is the name of a `PCollection` in the pipeline graph used for metrics. ```suggestion - Ensure the 'pcollection' parameter is passed to the `DataflowCostBenchmark` constructor. This is the name of the PCollection for which to measure throughput, and you can find this name in the Dataflow UI job graph. ``` ########## sdks/python/apache_beam/testing/benchmarks/inference/README.md: ########## @@ -100,4 +102,95 @@ Approximate size of the models used in the tests * bert-base-uncased: 417.7 MB * bert-large-uncased: 1.2 GB -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). +## PyTorch Sentiment Analysis DistilBERT base + +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) +**Accelerator**: CPU only +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). + +## VLLM Gemma 2b Batch Performance on Tesla T4 + +**Model**: google/gemma-2b-it +**Accelerator**: NVIDIA Tesla T4 GPU +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) + +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). + +## How to add a new ML benchmark pipeline + +1. Create the pipeline implementation + +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) +- Define CLI args and the logic +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). + +2. Create the benchmark implementation + +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) +- Inherit from DataflowCostBenchmark class. +- Ensure the expected 'pcollection' parameter is passed to your builder. This parameter could be obtained from GCP Dataflow Jobs -> Your Job Page. +- Keep naming consistent with other benchmarks. + +3. Add an options txt file + +- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt +- Include Dataflow and pipeline flags. Example: + +``` +--region=us-central1 +--machine_type=n1-standard-2 +--num_workers=75 +--disk_size_gb=50 +--autoscaling_algorithm=NONE +--staging_location=gs://temp-storage-for-perf-tests/loadtests +--temp_location=gs://temp-storage-for-perf-tests/loadtests +--requirements_file=apache_beam/ml/inference/your-requirements-file.txt +--publish_to_big_query=true +--metrics_dataset=beam_run_inference +--metrics_table=your_table +--input_options={} Review Comment:  The `--input_options={}` flag is included in the example, but its purpose and expected values are not explained. Please add a brief description to clarify what this option is used for and provide an example of a valid value if possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
