+1

Testing is always hard, especially to have concrete tests. Reducing the "noise" is a good idea.

Regards
JB

On 03/10/2017 04:09 PM, Eugene Kirpichov wrote:
Hello,

I've seen a pattern in a couple of different transforms (IOs) where we, I
think, spend an excessive amount of code unit-testing the trivial builder
methods.

E.g. a significant part of
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java
is
devoted to that, and a significant part of the in-progress Hadoop
InputFormat IO.

In particular, I mean unit-testing trivial builder functionality such as:
- That setting a parameter actually sets it
- That setting parameters in one order gives the same effect as a different
order
- That a null value of a parameter is rejected with an appropriate error
message
- That a non-null value of a parameter is accepted

I'd like to come to a consensus as to how much unit-testing of such stuff
is appropriate when developing a new transform. And then put this consensus
into the PTransform Style Guide
<https://beam.apache.org/contribute/ptransform-style-guide/>.

I think such tests are not worth their weight in lines of code, for a few
reasons:
- Whether or not a parameter actually takes effect, should be tested using
a semantic test (RunnableOnService). Testing whether a getter returns the
same thing a setter set is neither necessary (already covered by a semantic
test) nor sufficient (it's possible that the expansion of the transform
forgets to even call the getter).
- Testing "a non-null value (or an otherwise valid value) is accepted" is
redundant with every semantic test that supplies the value.
- For testing of supplying a different order of parameters: my main
objections are 1) I'm not sure what kind of reasonably-likely coding error
this guards against, especially given most multi-parameter transforms use
AutoValue anyway, and given that "does a setter take effect" is already
tested via bullets above 2) given such a coding error exists, there's no
way to tell which orders  or how many you should test; unit-testing two or
three orders probably gives next to no protection, and unit-testing more
than that is impractical.
- Testing "a null value is rejected" effectively unit-tests a single line
that says checkNotNull(...). I don't think it's worth 10 or so lines of
code to test something this trivial: even if an author forgot to add this
line, and then a used supplied null - the user will just get a less
informative error message, possibly report it as a bug, and the error will
easily get fixed. I.e. it's a very low-risk thing to get wrong.

I think the following similar types of tests are still worth doing though:
- Non-trivial parameter validation logic: where the validation rule is
complex, or where the error message is interesting, etc.
- Cross-parameter invariants which, if violated, lead to data loss: e.g.
that if parameters foo and bar are mutually exclusive, verify that if the
user specifies both, they get an error, instead of one of them being
silently ignored.

Thoughts?


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to