Hello, I've seen a pattern in a couple of different transforms (IOs) where we, I think, spend an excessive amount of code unit-testing the trivial builder methods.
E.g. a significant part of https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java is devoted to that, and a significant part of the in-progress Hadoop InputFormat IO. In particular, I mean unit-testing trivial builder functionality such as: - That setting a parameter actually sets it - That setting parameters in one order gives the same effect as a different order - That a null value of a parameter is rejected with an appropriate error message - That a non-null value of a parameter is accepted I'd like to come to a consensus as to how much unit-testing of such stuff is appropriate when developing a new transform. And then put this consensus into the PTransform Style Guide <https://beam.apache.org/contribute/ptransform-style-guide/>. I think such tests are not worth their weight in lines of code, for a few reasons: - Whether or not a parameter actually takes effect, should be tested using a semantic test (RunnableOnService). Testing whether a getter returns the same thing a setter set is neither necessary (already covered by a semantic test) nor sufficient (it's possible that the expansion of the transform forgets to even call the getter). - Testing "a non-null value (or an otherwise valid value) is accepted" is redundant with every semantic test that supplies the value. - For testing of supplying a different order of parameters: my main objections are 1) I'm not sure what kind of reasonably-likely coding error this guards against, especially given most multi-parameter transforms use AutoValue anyway, and given that "does a setter take effect" is already tested via bullets above 2) given such a coding error exists, there's no way to tell which orders or how many you should test; unit-testing two or three orders probably gives next to no protection, and unit-testing more than that is impractical. - Testing "a null value is rejected" effectively unit-tests a single line that says checkNotNull(...). I don't think it's worth 10 or so lines of code to test something this trivial: even if an author forgot to add this line, and then a used supplied null - the user will just get a less informative error message, possibly report it as a bug, and the error will easily get fixed. I.e. it's a very low-risk thing to get wrong. I think the following similar types of tests are still worth doing though: - Non-trivial parameter validation logic: where the validation rule is complex, or where the error message is interesting, etc. - Cross-parameter invariants which, if violated, lead to data loss: e.g. that if parameters foo and bar are mutually exclusive, verify that if the user specifies both, they get an error, instead of one of them being silently ignored. Thoughts?