[
https://issues.apache.org/jira/browse/BEAM-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512643#comment-17512643
]
Valentyn Tymofieiev edited comment on BEAM-14163 at 3/26/22, 2:13 AM:
----------------------------------------------------------------------
Looking.
Using this command to repro:
{noformat}
./gradlew -PloadTest.mainClass=apache_beam.testing.load_tests.pardo_test \
-Prunner=DataflowRunner \
-PloadTest.args="--job_name=load-tests-python-dataflow-streaming-pardo-4-valentyn
\
--project=apache-beam-testing
--region=us-central1
--temp_location=gs://temp-storage-for-perf-tests/loadtests
--publish_to_big_query=false
--input_options='{\"num_records\":20000000,\"key_size\":10,\"value_size\":90}'
--iterations=1
--number_of_counter_operations=100
--number_of_counters=1
--num_workers=5
--autoscaling_algorithm=NONE
--streaming
--experiments=use_runner_v2
--runner=DataflowRunner" \
-PpythonVersion=3.7 \
--continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g
-Dorg.gradle.jvmargs=-Xmx4g -Dorg.gradle.vfs.watch=false \
:sdks:python:apache_beam:testing:load_tests:run{noformat}
Command line taken from console output on Jenikins, and syntax adjusted
accordingly to example in:
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py.]
The last part is important: just copy-pasting the command does not work and
due to finicky details about escaping various characters.
Also gradle logs replace 'worker' with ****r in the console logs that messes up
the command line.
was (Author: tvalentyn):
Looking.
Using this command to repro:
{noformat}
./gradlew -PloadTest.mainClass=apache_beam.testing.load_tests.pardo_test \
-Prunner=DataflowRunner \
-PloadTest.args="--job_name=load-tests-python-dataflow-streaming-pardo-4-valentyn
\
--project=apache-beam-testing
--region=us-central1
--temp_location=gs://temp-storage-for-perf-tests/loadtests
--publish_to_big_query=false
--input_options='{\"num_records\":20000000,\"key_size\":10,\"value_size\":90}'
--iterations=1
--number_of_counter_operations=100
--number_of_counters=1
--num_workers=5
--autoscaling_algorithm=NONE
--streaming
--experiments=use_runner_v2
--runner=DataflowRunner" \
-PpythonVersion=3.7 \
--continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g
-Dorg.gradle.jvmargs=-Xmx4g -Dorg.gradle.vfs.watch=false \
:sdks:python:apache_beam:testing:load_tests:run{noformat}
Command line taken from console output on Jenikins, and syntax adjusted
accordingly to example in:
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py.]
The last part is important: just copy-pasting the command does not work and
due to finicky details about escaping various characters.
> Performance Regressions in streaming python ParDo and GBK Load Tests
> --------------------------------------------------------------------
>
> Key: BEAM-14163
> URL: https://issues.apache.org/jira/browse/BEAM-14163
> Project: Beam
> Issue Type: Bug
> Components: community-metrics, sdk-py-core
> Affects Versions: 2.38.0
> Reporter: Daniel Oliveira
> Assignee: Valentyn Tymofieiev
> Priority: P0
> Fix For: 2.38.0
>
>
> As specified in the [Beam Release
> Guide|https://beam.apache.org/contribute/release-guide/#4-investigate-performance-regressions],
> I'm investigating performance regressions. The following load test metrics
> show a clear and persistant performance regression starting approximately
> around March 17 and affecting version 2.38.0.
> ParDo Load Tests:
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
> GBK Load Tests:
> http://metrics.beam.apache.org/d/UYZ-oJ3Zk/gbk-load-tests?orgId=1&var-processingType=streaming&var-sdk=python&from=now-30d&to=now
--
This message was sent by Atlassian Jira
(v8.20.1#820001)