ibzib opened a new pull request #12385: URL: https://github.com/apache/beam/pull/12385
# Motivation The main goals of migrating to pytest are: 1. Get Junit structured test output (BEAM-10527). 2. Replace PortableRunnerTest's bespoke timeout mechanism with pytest's (BEAM-9011). This also implicitly raises the timeout for all these tests from 60s to 600s, which should reduce the likelihood of timeout flakes (BEAM-8912). # Implementation I originally wanted to do this in incremental changes, but I gradually realized a complete overhaul of these tests' configuration was needed. The main challenge was that `flink_runner_test.py` expected to be run as `__main__`, which is impossible with pytest. I basically reworked everything except the tests themselves; the tests themselves are unchanged. ## Test parametrization - I left worker configuration in Gradle because it changes the test dependencies. I moved optimization and streaming into `flink_runner_test.py` because it removes the need to set up separate tox tasks, separate test result files, etc. - Since `flink_job_server_driver`, `environment_type`, and `environment_config` are all pipeline options, I decided to pass them to `flink_runner_test.py` by introducing a global pytest option, `--test-pipeline-options`. `nose` uses `--test-pipeline-options` for integration tests, so I figured this would be generally useful beyond just these tests in the future. # Bonus trivia Prior to this change, we were running the exact same streaming test suite *four times* per Jenkins run. Every `flinkCompatibilityMatrix` task ran the entirety of `flink_runner_test.py` which contained two classes: `FlinkRunnerTest` and `FlinkRunnerTestOptimized`. `FlinkRunnerTestOptimized` was basically the same thing as `FlinkRunnerTest`, but it added the `pre_optimize=all` experiment and skipped external transform tests, since the Python optimizer breaks external transforms (BEAM-7252). But we were *also* adding `pre_optimize=all` in Gradle, redundantly. The old configuration looks like this: ```groovy dependsOn flinkCompatibilityMatrix(streaming: false, workerType: CompatibilityMatrixConfig.SDK_WORKER_TYPE.LOOPBACK) dependsOn flinkCompatibilityMatrix(streaming: true, workerType: CompatibilityMatrixConfig.SDK_WORKER_TYPE.LOOPBACK) dependsOn flinkCompatibilityMatrix(streaming: true, workerType: CompatibilityMatrixConfig.SDK_WORKER_TYPE.LOOPBACK, preOptimize: true) ``` Notice that pre-optimized batch is missing. This is because `flinkCompatibilityMatrixBatchPreOptimize*` would run `FlinkRunnerTest` with `pre_optimize=all` but without skipping the external transform tests, causing failure. What about streaming, then? Well, the optimizer *doesn't affect streaming pipelines at all*: https://github.com/apache/beam/blob/489cf2cbf335f372060469953ed7599f2838a591/sdks/python/apache_beam/runners/portability/portable_runner.py#L319 So in one invocation of `flinkValidatesRunner`, `flinkCompatibilityMatrixStreamingLoopback` would run `FlinkRunnerTest` (without `pre_optimize=all`) and `FlinkRunnerTestOptimized`, then `flinkCompatibilityMatrixStreamingPreOptimizeLoopback` would run `FlinkRunnerTest` (with `pre_optimize=all`) and `FlinkRunnerTestOptimized` (with `pre_optimize=all` twice). Besides the skips in `FlinkRunnerTestOptimized`, all four tests would be doing the exact same thing. ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | --- XLang | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | --- Pre-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ --- |Java | Python | Go | Website --- | --- | --- | --- | --- Non-portable | [](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) Portable | --- | [](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
