Hello,

I've seen a pattern in a couple of different transforms (IOs) where we, I
think, spend an excessive amount of code unit-testing the trivial builder
methods.

E.g. a significant part of
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java
is
devoted to that, and a significant part of the in-progress Hadoop
InputFormat IO.

In particular, I mean unit-testing trivial builder functionality such as:
- That setting a parameter actually sets it
- That setting parameters in one order gives the same effect as a different
order
- That a null value of a parameter is rejected with an appropriate error
message
- That a non-null value of a parameter is accepted

I'd like to come to a consensus as to how much unit-testing of such stuff
is appropriate when developing a new transform. And then put this consensus
into the PTransform Style Guide
<https://beam.apache.org/contribute/ptransform-style-guide/>.

I think such tests are not worth their weight in lines of code, for a few
reasons:
- Whether or not a parameter actually takes effect, should be tested using
a semantic test (RunnableOnService). Testing whether a getter returns the
same thing a setter set is neither necessary (already covered by a semantic
test) nor sufficient (it's possible that the expansion of the transform
forgets to even call the getter).
- Testing "a non-null value (or an otherwise valid value) is accepted" is
redundant with every semantic test that supplies the value.
- For testing of supplying a different order of parameters: my main
objections are 1) I'm not sure what kind of reasonably-likely coding error
this guards against, especially given most multi-parameter transforms use
AutoValue anyway, and given that "does a setter take effect" is already
tested via bullets above 2) given such a coding error exists, there's no
way to tell which orders  or how many you should test; unit-testing two or
three orders probably gives next to no protection, and unit-testing more
than that is impractical.
- Testing "a null value is rejected" effectively unit-tests a single line
that says checkNotNull(...). I don't think it's worth 10 or so lines of
code to test something this trivial: even if an author forgot to add this
line, and then a used supplied null - the user will just get a less
informative error message, possibly report it as a bug, and the error will
easily get fixed. I.e. it's a very low-risk thing to get wrong.

I think the following similar types of tests are still worth doing though:
- Non-trivial parameter validation logic: where the validation rule is
complex, or where the error message is interesting, etc.
- Cross-parameter invariants which, if violated, lead to data loss: e.g.
that if parameters foo and bar are mutually exclusive, verify that if the
user specifies both, they get an error, instead of one of them being
silently ignored.

Thoughts?

Reply via email to