lostluck commented on pull request #14504:
URL: https://github.com/apache/beam/pull/14504#issuecomment-819720007


   I agree with all that! (getting better signal on Validates Runner Tests; 
make the PreCommit and PostCommit suites mean when they actually run rather 
than "A fast suite of tests, and a slow and heavy suite of tests", and avoiding 
the combinatorial complexity of running all the SDK tests against all the 
runners, etc).
   
   At this point, the Go SDK is superseded in validating portability by the 
portable Python experience, and Java. The Go suites should be renamed away from 
being ValidatesRunners (VR), and largely validate that a smoke test of Go SDK 
functionality runs on each Runner properly. The Go SDK (and future SDKs) should 
not aim to replicate the full coverage of Runner behaviors, and instead focus 
on tests that test 1. User side PTransform code, such as for IOs and other 
Transform libraries, and 2. Pipelines that exercise behaviors on the SDK 
harness side of the worker (data & state management,  panes, splitting, 
checkpointing etc).
   
   Some small set of these tests must end up being tested on properly 
distributed runners however, single loopback process runs, or 1 bundle at a 
time runs, are prone to cheating through local state. Such cheating invalidates 
the test's results. While we can always manually verify this kind of thing, I'd 
rather that part be validated as a closedbox for durability.
   
   This is *not* how the Go SDK testing is currently set up however.
   
   Right now, the basic precommit suite builds all the go code in the beam 
project (need to verify boot.go afterall),  has all the unit tests for the SDK, 
and which at most have small pipelines that run against the anemic Go Direct 
runner for light internal verification.
   
   The post commit suite is a small set of integration tests, covering 
primitives, and some basic pipelines with a small set of the IOs (mostly 
Textio) and "built in" transforms. They run against each portable runner as 
post commits (Dataflow, Spark, Flink). The same suite is run against the Python 
ULR as a PreCommit, which catches most issues.
   
   Ideally we'll expand that to cover more of the transforms and IOs, and 
examples (as separate suites, as you've got displayed in the table).
   
   ------
   
   As to where the "Go PostCommit" target should go (this one I mean 
https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/ ), as 
I've insisted, it only runs against Dataflow. It should currently go into the 
Dataflow cell of the Validates Runner table, as that's the only table that 
refers to specific runners. I would like there to be an indication that the SDK 
is tested against dataflow, and that target is what currently does it.
   
   The Go SDK only has one integration test suite that's used against the FnAPI 
compatible runners, so it's currently acceptable to have that same Dataflow 
target as the PostCommit badge too. Not ideal, but correct for now.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to