Re: Style: how much testing for transform builder classes?

Jean-Baptiste Onofré Fri, 18 Aug 2017 22:49:43 -0700

Hi Eugene

It sounds good to me.


Regards
JB

On Aug 18, 2017, 21:42, at 21:42, Eugene Kirpichov 
<[email protected]> wrote:
>Hi all,
>
>This seems to have slipped through the cracks. I'd like to raise this
>again
>since I'm doing a cleanup in https://github.com/apache/beam/pull/3730 ,
>and
>I'd like to get consensus and add the guidance to PTransform Style
>Guide or
>the Testing Guide.
>
>Let me rephrase my suggestions from the previous email more clearly and
>concisely:
>- Do not test successful validation
>- Do not test trivial validation errors (e.g. "fails when a property is
>unset/null/empty/negative/...")
>- Add 1 test per non-trivial class of validation error (e.g.
>"properties
>foo and bar are exclusive", validation of query syntax, etc)
>- Define "nontrivial" as: where missing/incorrect/uninformative
>validation
>may lead to serious problems: data loss, counter-intuitive behavior,
>value
>of a property being silently ignored, or other hard-to-debug errors.
>- Test that each withFoo() method generally has effect (is not
>ignored),
>using TestPipeline and PAssert
>
>Does this sound reasonable?
>
>On Mon, Mar 27, 2017 at 2:28 PM Eugene Kirpichov <[email protected]>
>wrote:
>
>> From Ismael's and Dan's comments, I think I agree that there's a
>class of
>> easy-to-make errors that justifies having some builder tests.
>>
>> 1. Errors of the form "withFoo() instead sets bar, which is of the
>same
>> type as foo so it still compiles" (what Ismael quoted). This, I
>think,
>> should be caught by NeedsRunner tests, because it means that
>withFoo() is
>> effectively being completely ignored, and NeedsRunner tests need to
>test
>> that each property is taking effect, semantically.
>> 2. Errors of the form "withFoo() overrides bar with a default value"
>(e.g.
>> in the commit Dan quoted). This means that foo and bar don't commute,
>i.e.
>> .withFoo().withBar() is not the same as .withBar().withFoo().
>>
>> Errors of the form #2 warrant builder tests. To catch errors of this
>form,
>> it's sufficient to have a single test - that a transform fully
>configured
>> (with non-default values) in one order is equivalent to a transform
>fully
>> configured in precisely the reverse order - this way any two
>properties
>> foo,bar will be tested for whether they commute. Indirect effects
>(where
>> the relative order of more than two properties matters) seem very
>unlikely.
>>
>> OK, here's my revised proposal for what testing of transforms with
>many
>> properties should look like, for inclusion (if people agree)
>> almost-verbatim into the style/testing guide:
>> - For each property with non-trivial validation or a non-trivial
>error
>> message produced from validation, a unit test verifying that invalid
>values
>> are rejected with the proper error message.
>> - Same for rejecting invalid configuration of the transform as a
>whole
>> (e.g. unset required properties, or incompatible but individually
>valid
>> values specified for property combinations), one test per
>substantially
>> different class of error.
>> - One "blackbox" (@NeedsRunner) test per property (or per combination
>of
>> property values causing the transform to behave substantially
>differently),
>> verifying that every property has the desired effect when the
>transform is
>> executed.
>> - A single "whitebox" test that constructs a transform with
>non-default
>> values of all properties in one order and in the exact reverse order,
>and
>> verifies that the resulting transforms are configured equivalently.
>>
>> WDYT?
>>
>> On Tue, Mar 21, 2017 at 5:22 PM Robert Bradshaw
>> <[email protected]> wrote:
>>
>>> On Tue, Mar 21, 2017 at 5:14 PM, Dan Halperin
><[email protected]
>>> >
>>> wrote:
>>>
>>> >
>https://github.com/apache/beam/commit/b202548323b4d59b11bbdf06c99d0f
>>> > 99e6a947ef
>>> > is one example where tests of feature Bar exist but did not
>discover
>>> bugs
>>> > that could be introduced by builders.
>>> >
>>>
>>> True, though one would need to test the full cross-product of
>builder
>>> orderings to discover this, a simple test of setValidate() wouldn't
>>> suffice
>>> either. (And if not the full cross product, what subset?) Maybe some
>>> automated fuzz testing would be a fair thing to do here (cheaper and
>more
>>> comprehensive than manual tests...)?
>>>
>>> AutoValue like alleviates many, but not all, of these concerns - as
>Ismael
>>> > points out.
>>> >
>>>
>>> If two features are not orthogonal, that perhaps merits more test
>(and
>>> documentation).
>>>
>>>
>>> >
>>> >
>>> >
>>> > On Tue, Mar 21, 2017 at 1:18 PM, Robert Bradshaw <
>>> > [email protected]> wrote:
>>> >
>>> > > On Wed, Mar 15, 2017 at 2:11 AM, Ismaël Mejía
><[email protected]>
>>> wrote:
>>> > >
>>> > > > +1 to Vikas point maybe the right place to enforce things
>correct
>>> > > > build tests is in the validate and like this reduce the test
>>> > > > boilerplate and only test the validate, but I wonder if this
>totally
>>> > > > covers both cases (the buildsCorrectly and
>>> > > > buildsCorrectlyInDifferentOrder ones).
>>> > > >
>>> > > > I answer Eugene’s question here even if you are aware now
>since you
>>> > > > commented in the PR, so everyone understands the case.
>>> > > >
>>> > > > The case is pretty simple, when you extend an IO and add a new
>>> > > > configuration parameter, suppose we have withFoo(String foo)
>and we
>>> > > > want to add withBar(String bar). In some cases the
>implementation or
>>> > > > even worse the combination of those are not built correctly,
>so the
>>> > > > only way to guarantee that this works is to have code that
>tests the
>>> > > > complete parameter combination or tests that at least assert
>that
>>> the
>>> > > > object is built correctly.
>>> > > >
>>> > > > This is something that can happen both with or without
>AutoValue
>>> > > > because the with method is hand-written and the natural
>tendency
>>> with
>>> > > > boilerplate methods like this is to copy/paste, so we can end
>up
>>> doing
>>> > > > silly things like:
>>> > > >
>>> > > >     private Read(String foo, String bar) { … }
>>> > > >
>>> > > >     public Read withBar(String bar) {
>>> > > >       return new Read(foo, null);
>>> > > >     }
>>> > > >
>>> > > > in this case the reference to bar is not stored or assigned
>(this is
>>> > > > similar to the case of the DatastoreIO PR), and AutoValue may
>seem
>>> to
>>> > > > solve this issue but you can also end up with this situation
>if you
>>> > > > copy paste the withFoo method and just change the method name:
>>> > > >
>>> > > >     public Read withBar(String foo) {
>>> > > >       return builder().setFoo(foo).build();
>>> > > >     }
>>> > > >
>>> > > > Of course both seem silly but both can happen and the tests at
>least
>>> > > > help to discover those,
>>> > > >
>>> > >
>>> > > Such mistakes should be entirely discovered by tests of feature
>Bar.
>>> If
>>> > Bar
>>> > > is not actually being tested, that's a bigger problem with
>coverage
>>> that
>>> > a
>>> > > construction-only test actually obscures (giving it negative
>value).
>>> > >
>>> > >
>>> > > >
>>> > > > On Wed, Mar 15, 2017 at 1:05 AM, vikas rk <[email protected]>
>>> wrote:
>>> > > > > Yes, what I meant is: Necessary tests are ones that blocks
>users
>>> if
>>> > not
>>> > > > > present. Trivial or non-trivial shouldn't be the issue in
>such
>>> cases.
>>> > > > >
>>> > > > > Some of the boilerplate code and tests is because IO
>PTransforms
>>> are
>>> > > > > returned to the user before they are fully constructed and
>actual
>>> > > > > validation happens in the validate method rather than at
>>> > construction.
>>> > > I
>>> > > > > understand that the reasoning here is that we want to
>support
>>> > > > constructing
>>> > > > > them with options in any order and using Builder pattern can
>be
>>> > > > confusing.
>>> > > > >
>>> > > > > If validate method is where all the validation happens, then
>we
>>> > should
>>> > > > able
>>> > > > > to eliminate some redundant checks and tests during
>construction
>>> time
>>> > > > like
>>> > > > > in *withOption* methods here
>>> > > > > <https://github.com/apache/beam/blob/master/sdks/java/io/
>>> > > > google-cloud-platform/src/main/java/org/apache/beam/sdk/
>>> > > > io/gcp/bigtable/BigtableIO.java#L199>
>>> > > > >  and here
>>> > > > > <https://github.com/apache/beam/blob/master/sdks/java/io/
>>> > > > google-cloud-platform/src/main/java/org/apache/beam/sdk/
>>> > > > io/gcp/datastore/DatastoreV1.java#L387>
>>> > > > > as
>>> > > > > these are also checked in the validate method.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > -Vikas
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On 14 March 2017 at 15:40, Eugene Kirpichov
>>> > > <[email protected]
>>> > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Thanks all. Looks like people are on board with the general
>>> > direction
>>> > > > >> though it remains to refine it to concrete guidelines to go
>into
>>> > style
>>> > > > >> guide.
>>> > > > >>
>>> > > > >> Ismaël, can you give more details about the situation you
>>> described
>>> > in
>>> > > > the
>>> > > > >> first paragraph? Is it perhaps that really a
>RunnableOnService
>>> test
>>> > > was
>>> > > > >> missing (and perhaps still is), rather than a builder test?
>>> > > > >>
>>> > > > >> Vikas, regarding trivial tests and user waiting for a
>>> work-around:
>>> > in
>>> > > > the
>>> > > > >> situation I described, they don't really need a workaround
>- they
>>> > > > specified
>>> > > > >> an invalid value and have been minorly inconvenienced
>because the
>>> > > error
>>> > > > >> they got about it was not very readable, so fixing their
>value
>>> took
>>> > > > them a
>>> > > > >> little longer than it could have, but they fixed it and
>their
>>> work
>>> > is
>>> > > > not
>>> > > > >> blocked. I think Robert's arguments about the cost of
>trivial
>>> tests
>>> > > > apply.
>>> > > > >>
>>> > > > >> I agree that the author should be at liberty to choose
>which
>>> > > validation
>>> > > > to
>>> > > > >> unit-test and which to skip as trivial, so documentation on
>this
>>> > topic
>>> > > > >> should be in the form of guidelines, high-quality example
>code
>>> (i.e.
>>> > > > clean
>>> > > > >> up the unit tests of IOs bundled with Beam SDK), and
>informal
>>> > > knowledge
>>> > > > in
>>> > > > >> the heads of readers of this thread, rather than hard
>rules.
>>> > > > >>
>>> > > > >> On Tue, Mar 14, 2017 at 8:07 AM Ismaël Mejía
><[email protected]>
>>> > > wrote:
>>> > > > >>
>>> > > > >> > +0.5
>>> > > > >> >
>>> > > > >> > I used to think that some of those tests were not worth,
>for
>>> > example
>>> > > > >> > testBuildRead and
>>> > > > >> > testBuildReadAlt. However the reality is that these tests
>>> allowed
>>> > me
>>> > > > to
>>> > > > >> > find bugs both during the development of HBaseIO and just
>>> > yesterday
>>> > > > when
>>> > > > >> I
>>> > > > >> > tried to test the write support for the emulator with
>>> DataStoreIO
>>> > > > (that
>>> > > > >> > lacked a parameter in testBuildWrite and didn’t have a
>>> > > > testBuildWriteAlt
>>> > > > >> > and broke in that case too), so I now believe they are
>not
>>> > > necessarily
>>> > > > >> > useless.
>>> > > > >> >
>>> > > > >> > I agree with the idea of trying to test the most
>important
>>> things
>>> > > > first
>>> > > > >> and
>>> > > > >> > as Kenneth said trying to reduce the tests of successful
>>> > validation,
>>> > > > >> > however I am not sure how this can be formalized in the
>style
>>> > guide
>>> > > > >> > considering that the value of the tests seems to be based
>on
>>> the
>>> > > > judgment
>>> > > > >> > of both the developer and the reviewer.
>>> > > > >> >
>>> > > > >> > One final aspect if we achieve consensus is that we
>should not
>>> > > forget
>>> > > > >> that
>>> > > > >> > if we reduce some of this tests we have to pay attention
>to not
>>> > > reduce
>>> > > > >> the
>>> > > > >> > current level of coverage.
>>> > > > >> >
>>> > > > >> > Regards,
>>> > > > >> > Ismaël
>>> > > > >> >
>>> > > > >> >
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > On Mon, Mar 13, 2017 at 8:37 PM, vikas rk
><[email protected]>
>>> > > wrote:
>>> > > > >> >
>>> > > > >> > > +0.5
>>> > > > >> > >
>>> > > > >> > > My two cents,
>>> > > > >> > >
>>> > > > >> > >   * However trivial the test is it should be added
>unless
>>> user
>>> > > has a
>>> > > > >> easy
>>> > > > >> > > workaround to not having to wait for a few days until
>the
>>> > trivial
>>> > > > fixes
>>> > > > >> > are
>>> > > > >> > > merged to beam and then propagated to the runner.
>>> > > > >> > >
>>> > > > >> > >   * While I agree with trivial tests like "ensuring
>>> meaningful
>>> > > error
>>> > > > >> > > message thrown for null options" or "ordering of
>parameters"
>>> not
>>> > > > being
>>> > > > >> > > strictly necessary and definitely not something that
>should
>>> be
>>> > > used
>>> > > > as
>>> > > > >> > > reference for other IOs, (staying consistent with other
>IOs
>>> is
>>> > > > probably
>>> > > > >> > the
>>> > > > >> > > reason why DatastoreIO has some of them), I still think
>some
>>> > > degree
>>> > > > of
>>> > > > >> > > leeway for the author is desirable depending on what
>level of
>>> > user
>>> > > > >> > > experience they want to provide, as long as it is not a
>>> burden
>>> > for
>>> > > > >> adding
>>> > > > >> > > new tests.
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > > On 11 March 2017 at 14:16, Jean-Baptiste Onofré <
>>> > [email protected]>
>>> > > > >> wrote:
>>> > > > >> > >
>>> > > > >> > > > +1
>>> > > > >> > > >
>>> > > > >> > > > Testing is always hard, especially to have concrete
>tests.
>>> > > > Reducing
>>> > > > >> the
>>> > > > >> > > > "noise" is a good idea.
>>> > > > >> > > >
>>> > > > >> > > > Regards
>>> > > > >> > > > JB
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > > > On 03/10/2017 04:09 PM, Eugene Kirpichov wrote:
>>> > > > >> > > >
>>> > > > >> > > >> Hello,
>>> > > > >> > > >>
>>> > > > >> > > >> I've seen a pattern in a couple of different
>transforms
>>> (IOs)
>>> > > > where
>>> > > > >> > we,
>>> > > > >> > > I
>>> > > > >> > > >> think, spend an excessive amount of code
>unit-testing the
>>> > > trivial
>>> > > > >> > > builder
>>> > > > >> > > >> methods.
>>> > > > >> > > >>
>>> > > > >> > > >> E.g. a significant part of
>>> > > > >> > > >>
>>> https://github.com/apache/beam/blob/master/sdks/java/io/goog
>>> > > > >> > > >>
>le-cloud-platform/src/test/java/org/apache/beam/sdk/io/
>>> > > > >> > > >> gcp/datastore/DatastoreV1Test.java
>>> > > > >> > > >> is
>>> > > > >> > > >> devoted to that, and a significant part of the
>in-progress
>>> > > Hadoop
>>> > > > >> > > >> InputFormat IO.
>>> > > > >> > > >>
>>> > > > >> > > >> In particular, I mean unit-testing trivial builder
>>> > > functionality
>>> > > > >> such
>>> > > > >> > > as:
>>> > > > >> > > >> - That setting a parameter actually sets it
>>> > > > >> > > >> - That setting parameters in one order gives the
>same
>>> effect
>>> > > as a
>>> > > > >> > > >> different
>>> > > > >> > > >> order
>>> > > > >> > > >> - That a null value of a parameter is rejected with
>an
>>> > > > appropriate
>>> > > > >> > error
>>> > > > >> > > >> message
>>> > > > >> > > >> - That a non-null value of a parameter is accepted
>>> > > > >> > > >>
>>> > > > >> > > >> I'd like to come to a consensus as to how much
>>> unit-testing
>>> > of
>>> > > > such
>>> > > > >> > > stuff
>>> > > > >> > > >> is appropriate when developing a new transform. And
>then
>>> put
>>> > > this
>>> > > > >> > > >> consensus
>>> > > > >> > > >> into the PTransform Style Guide
>>> > > > >> > > >> <
>>> https://beam.apache.org/contribute/ptransform-style-guide/
>>> > >.
>>> > > > >> > > >>
>>> > > > >> > > >> I think such tests are not worth their weight in
>lines of
>>> > code,
>>> > > > for
>>> > > > >> a
>>> > > > >> > > few
>>> > > > >> > > >> reasons:
>>> > > > >> > > >> - Whether or not a parameter actually takes effect,
>>> should be
>>> > > > tested
>>> > > > >> > > using
>>> > > > >> > > >> a semantic test (RunnableOnService). Testing whether
>a
>>> getter
>>> > > > >> returns
>>> > > > >> > > the
>>> > > > >> > > >> same thing a setter set is neither necessary
>(already
>>> covered
>>> > > by
>>> > > > a
>>> > > > >> > > >> semantic
>>> > > > >> > > >> test) nor sufficient (it's possible that the
>expansion of
>>> the
>>> > > > >> > transform
>>> > > > >> > > >> forgets to even call the getter).
>>> > > > >> > > >> - Testing "a non-null value (or an otherwise valid
>value)
>>> is
>>> > > > >> accepted"
>>> > > > >> > > is
>>> > > > >> > > >> redundant with every semantic test that supplies the
>>> value.
>>> > > > >> > > >> - For testing of supplying a different order of
>>> parameters:
>>> > my
>>> > > > main
>>> > > > >> > > >> objections are 1) I'm not sure what kind of
>>> reasonably-likely
>>> > > > coding
>>> > > > >> > > error
>>> > > > >> > > >> this guards against, especially given most
>multi-parameter
>>> > > > >> transforms
>>> > > > >> > > use
>>> > > > >> > > >> AutoValue anyway, and given that "does a setter take
>>> effect"
>>> > is
>>> > > > >> > already
>>> > > > >> > > >> tested via bullets above 2) given such a coding
>error
>>> exists,
>>> > > > >> there's
>>> > > > >> > no
>>> > > > >> > > >> way to tell which orders  or how many you should
>test;
>>> > > > unit-testing
>>> > > > >> > two
>>> > > > >> > > or
>>> > > > >> > > >> three orders probably gives next to no protection,
>and
>>> > > > unit-testing
>>> > > > >> > more
>>> > > > >> > > >> than that is impractical.
>>> > > > >> > > >> - Testing "a null value is rejected" effectively
>>> unit-tests a
>>> > > > single
>>> > > > >> > > line
>>> > > > >> > > >> that says checkNotNull(...). I don't think it's
>worth 10
>>> or
>>> > so
>>> > > > lines
>>> > > > >> > of
>>> > > > >> > > >> code to test something this trivial: even if an
>author
>>> forgot
>>> > > to
>>> > > > add
>>> > > > >> > > this
>>> > > > >> > > >> line, and then a used supplied null - the user will
>just
>>> get
>>> > a
>>> > > > less
>>> > > > >> > > >> informative error message, possibly report it as a
>bug,
>>> and
>>> > the
>>> > > > >> error
>>> > > > >> > > will
>>> > > > >> > > >> easily get fixed. I.e. it's a very low-risk thing to
>get
>>> > wrong.
>>> > > > >> > > >>
>>> > > > >> > > >> I think the following similar types of tests are
>still
>>> worth
>>> > > > doing
>>> > > > >> > > though:
>>> > > > >> > > >> - Non-trivial parameter validation logic: where the
>>> > validation
>>> > > > rule
>>> > > > >> is
>>> > > > >> > > >> complex, or where the error message is interesting,
>etc.
>>> > > > >> > > >> - Cross-parameter invariants which, if violated,
>lead to
>>> data
>>> > > > loss:
>>> > > > >> > e.g.
>>> > > > >> > > >> that if parameters foo and bar are mutually
>exclusive,
>>> verify
>>> > > > that
>>> > > > >> if
>>> > > > >> > > the
>>> > > > >> > > >> user specifies both, they get an error, instead of
>one of
>>> > them
>>> > > > being
>>> > > > >> > > >> silently ignored.
>>> > > > >> > > >>
>>> > > > >> > > >> Thoughts?
>>> > > > >> > > >>
>>> > > > >> > > >>
>>> > > > >> > > > --
>>> > > > >> > > > Jean-Baptiste Onofré
>>> > > > >> > > > [email protected]
>>> > > > >> > > > http://blog.nanthrax.net
>>> > > > >> > > > Talend - http://www.talend.com
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>>
>>

Re: Style: how much testing for transform builder classes?

Reply via email to