lostluck commented on pull request #14504: URL: https://github.com/apache/beam/pull/14504#issuecomment-819720007
I agree with all that! (getting better signal on Validates Runner Tests; make the PreCommit and PostCommit suites mean when they actually run rather than "A fast suite of tests, and a slow and heavy suite of tests", and avoiding the combinatorial complexity of running all the SDK tests against all the runners, etc). At this point, the Go SDK is superseded in validating portability by the portable Python experience, and Java. The Go suites should be renamed away from being ValidatesRunners (VR), and largely validate that a smoke test of Go SDK functionality runs on each Runner properly. The Go SDK (and future SDKs) should not aim to replicate the full coverage of Runner behaviors, and instead focus on tests that test 1. User side PTransform code, such as for IOs and other Transform libraries, and 2. Pipelines that exercise behaviors on the SDK harness side of the worker (data & state management, panes, splitting, checkpointing etc). Some small set of these tests must end up being tested on properly distributed runners however, single loopback process runs, or 1 bundle at a time runs, are prone to cheating through local state. Such cheating invalidates the test's results. While we can always manually verify this kind of thing, I'd rather that part be validated as a closedbox for durability. This is *not* how the Go SDK testing is currently set up however. Right now, the basic precommit suite builds all the go code in the beam project (need to verify boot.go afterall), has all the unit tests for the SDK, and which at most have small pipelines that run against the anemic Go Direct runner for light internal verification. The post commit suite is a small set of integration tests, covering primitives, and some basic pipelines with a small set of the IOs (mostly Textio) and "built in" transforms. They run against each portable runner as post commits (Dataflow, Spark, Flink). The same suite is run against the Python ULR as a PreCommit, which catches most issues. Ideally we'll expand that to cover more of the transforms and IOs, and examples (as separate suites, as you've got displayed in the table). ------ As to where the "Go PostCommit" target should go (this one I mean https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/ ), as I've insisted, it only runs against Dataflow. It should currently go into the Dataflow cell of the Validates Runner table, as that's the only table that refers to specific runners. I would like there to be an indication that the SDK is tested against dataflow, and that target is what currently does it. The Go SDK only has one integration test suite that's used against the FnAPI compatible runners, so it's currently acceptable to have that same Dataflow target as the PostCommit badge too. Not ideal, but correct for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
