Re: Testing operators / CI

Priyanka Gugale Mon, 12 Sep 2016 23:32:12 -0700

As I have seen so far, most of the data store systems have mock or embedded
servers mainly used for testing. We should try to use them as much as
possible. For data systems which don't have embedded/mock server for those
I/O operators only will need to follow some different strategy for
coverage. Not sure if mockito can be useful for writing such integration
tests.


-Priyanka

On Tue, Sep 13, 2016 at 10:57 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> Most of this discussion applies only to processing operators (non-I/O
> operators), right?
>
> I/O operators have to be tested with their respective endpoint (e.g.
> ActiveMQ operator with the ActiveMQ broker) and the effort of developing a
> mock and testing with it, is not worth the effort needed. So how do we get
> better coverage for I/O operators?
>
>
> On 9/12/16, 6:02 PM, "Thomas Weise" <tho...@datatorrent.com> wrote:
>
>     Yes, I suggested that. Looking for volunteers to tackle these things.
>
>
>     On Mon, Sep 12, 2016 at 5:44 PM, Pramod Immaneni <
> pra...@datatorrent.com>
>     wrote:
>
>     > I agree but I think it will also help if we provide more tools in
> this
>     > space like providing an operator test driver that goes through the
>     > lifecycle methods of an operator and offers configurability and
> variations.
>     > This driver could be bootstrapped from the unit test. I see the
> setup,
>     > beginWindow, process, endWindow and teardown call pattern repeated
> in many
>     > unit tests and this can expand to more methods when operator
> implements
>     > more interfaces.
>     >
>     > Thanks
>     >
>     > On Mon, Sep 12, 2016 at 5:26 PM, Thomas Weise <t...@apache.org>
> wrote:
>     >
>     > > Hi,
>     > >
>     > > Recently there was a bit of discussion on how to write tests for
>     > operators
>     > > that will result in good coverage and high confidence in the
> results of
>     > the
>     > > CI. Experience from past releases show that those operators with
> good
>     > > coverage are less likely to break down (with a user) due to
> subsequent
>     > > changes, while those that don't have coverage in the CI (think
> contrib)
>     > are
>     > > likely to suffer breakdown even due to trivial changes that are
> otherwise
>     > > easily caught.
>     > >
>     > > IMO writing good tests is as important as the operator main code
> (and
>     > > documentation and examples..). It was also part of the maturity
> framework
>     > > that Ashwin proposed a while ago (Ashwin, maybe you can also share
> a few
>     > > points). I suggest we expand the contribution guidelines to
> reflect an
>     > > agreed set of expectations that contributors can follow when
> submitting
>     > PRs
>     > > or even come up with a checklist for submitting PRs:
>     > >
>     > > http://apex.apache.org/malhar-contributing.html
>     > >
>     > > Here are a few recurring problems and suggestions in nor particular
>     > order:
>     > >
>     > >    - Unit tests are for testing small pieces of code in isolation
>     > ("unit").
>     > >    Running a DAG in embedded mode is not a unit test, it is an
>     > integration
>     > >    test.
>     > >    - When writing an operator or making changes to fix bugs etc.,
> it is
>     > >    recommended to write or modify the granular test that exercises
> this
>     > > change
>     > >    and as little as possible around it. This happens before
> writing or
>     > > running
>     > >    an application and can be done in fast iterations inside the IDE
>     > without
>     > >    extensive test data setup or application assembly.
>     > >    - When an operator consists of multiple other components, then
> testing
>     > >    for those should also be broken down into units. For example,
> managed
>     > > state
>     > >    is not tested by testing dedup or join operator (which are
> special use
>     > >    cases), but through separate tests, that exercise the full
> spectrum
>     > (or
>     > > at
>     > >    least close to) of managed state.
>     > >    - So what about serialization, don't I need to create a DAG to
> test
>     > it?
>     > >    You only need Kryo to test serialization of an operator. Use the
>     > > existing
>     > >    utilities or contribute to utilities that are shared between
> tests.
>     > >    - Don't I need to run a DAG to test the lifecycle of an
> operator? No,
>     > >    the sequence of calls to an operator's lifecycle methods are
>     > documented
>     > > (or
>     > >    how else would I implement an operator to start with). There are
>     > quite a
>     > >    few tests that "execute" the operator directly. They have
> access to
>     > the
>     > >    state and can assert that with a certain process invocation the
>     > expected
>     > >    changes occur. That is much more difficult when running a DAG.
>     > >    - I have to write a lot of code to do such testing and possibly
> I will
>     > >    forget some calls? Not when following test driven development.
> IMO
>     > that
>     > >    mostly happens when tests are written as afterthought and
> that's a
>     > > waste of
>     > >    time. I would suggest though to develop a single operator test
> driver
>     > > that
>     > >    will ensures all methods are called for basic sanity check.
>     > >    - Integration tests: with proper unit test coverage, the
> integration
>     > >    test is more like an example of how to use an operator. Nice for
>     > users,
>     > >    because they can use it as a starting point for writing their
> own app,
>     > >    including the configuration.
>     > >    - I wrote a nice integration test app with configuration. It
> runs  for
>     > >    exactly <n> seconds (localmode.run(n)) returns and all looks
> green. It
>     > > even
>     > >    prints some nice stuff in the console. What's wrong? You have
> not
>     > tested
>     > >    anything! An operator may fail in setup and the test still
> passes.
>     > > Travis
>     > >    CI is not reading the console (instead, it will complain that
> tests
>     > are
>     > >    filling up 4MB too fast and really important logs go under).
> Instead,
>     > >    assert on your test code that the DAG execution produces the
> expected
>     > >    results. Instead of waiting for <n> seconds wait until expected
>     > results
>     > > are
>     > >    in and cap it with a timeout. This is yet another area where a
> few
>     > >    utilities for recurring test code will come in handy.
>     > >    - Tests sometimes fail, but they work on my local machine? Every
>     > >    environment is different and good tests don't depend on
> environment
>     > >    specific factors (timing dependency, excessive resource
> utilization
>     > > etc.).
>     > >    It is important that tests pass in the CI consistently and that
> issues
>     > >    found there are investigated and fixed. Isn't it nice to see
> the green
>     > >    check mark in the PR instead of having to close/reopen several
> times
>     > so
>     > >    that the unrelated flaky test does not fail. If we collectively
> track
>     > > and
>     > >    fix such failures life will be better for everyone.
>     > >
>     > > Looking forward to feedback, additions and most importantly
> volunteers
>     > that
>     > > will help making the Apex CI better.
>     > >
>     > > Thanks,
>     > > Thomas
>     > >
>     >
>
>
>
>
>

Re: Testing operators / CI

Reply via email to