Hi all! We are currently working on the IO IT "test harness" that will allow to run the IOITs on various runners, filesystems and with changing amount of data. It is described in a doc some of you have probably seen and put comments in the doc [1] (in the context of BEAM-3060 [2] task).
Part of the job that in our opinion could be done is to somehow validate the input parameters that are passed to the test. By saying "test input parameters" I specifically mean: - parameters passed as mvn's system properties, such as: "-Dfilesystem=gcs" or "-DintegrationTestRunner=dataflow" - parameters that are to be passed as PipelineOptions, such as "numberOfRecords" or "filenamePrefix" in TextIO We imagine a situation when test parameters are passed in an incompatible way, eg. someone can be willing to use the "dataflow" runner on some filesystem that is unsupported there (say, s3). Running an IOIT with such setup will most certainly fail. The crux of the idea is to inform the developer early enough so that no such errors are made and test execution time is saved. It eases debugging and avoids potential configuration errors. The doc [1] fully specifies what has to be validated. We'd like to validate: 1. runner + filesystem combination - both passed as system properties 2. filesystem + pipeline options - as some additional options may be required by specific FS. 3. runner + pipeline options - as some additional options may be required by specific runner 4. IO test class instance (eg. TextIO) + options dedicated to it (eg. numberOfRecords) We have an idea to write a small maven plugin for the validation. An initial PoC to just view the concept is on GitHub [3]. The plugin would essentialy be a small module in the ".test-infra" folder - the same as other test-related stuff such as kubernetes scripts or jenkins files. We should hide only the validation logic there - what gets validated should be declaratively provided in pom.xml of each IOIT. All necessary validation configuration could be passed in plugin's <configuration> section [4], [5]. Pros: + input parameters validation is run very early - before compilation. We don't waste time and wait for the tests to fail + validation can be run depending on profile - we can append the plugin to the profile's build section (eg. to an io-it profiles build section). + validation is run once - after the first failure there is an exception thrown which breaks the build (in case of running more than one test) + (AFAIK) we could support multiple sdks in the future with this plugin - the only condition is being able to start the mvn plugin in some initial build phase + (AFAIK) we can run it on gradle or adapt it to be a gradle task too + every IO can have a set of validation rules defined in it's pom.xml which is readable and clearly shows what combination can be done or not) Cons: - everytime a new filesystem/runner etc is supported we have to update the rules - the plugin needs to be a separate module Concerns: - is validating all the four points there a good idea? Perhaps the scope of the things to validate in the plugin is too wide? As far as I know, validation for runner specific PipelineOptions is already available. What do you think? Thanks, Łukasz [1] https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit#heading=h.84fcpdbcdqu [2] https://issues.apache.org/jira/browse/BEAM-3060 [3] https://github.com/lgajowy/beam/blob/validator-poc/.test-infra/testprops/src/main/java/org/apache/beam/testprops/TestPropsPlugin.java [4] https://github.com/lgajowy/beam/blob/validator-poc/sdks/java/io/jdbc/pom.xml#L46 [5] https://github.com/lgajowy/beam/blob/validator-poc/sdks/java/io/file-based-io-tests/pom.xml#L47
