The link to
https://docs.pytest.org/en/latest/example/markers.html#custom-marker-and-command-line-option-to-control-test-runs
helps
to clarify some of the customization required to add CLI options that
select test sets based on markers.  +1 for a common default with *no
marker*.  (It's hard to guess how many test sets are required and how many
extra lines of "marker code" are needed for each category and how the Venn
diagrams work out.  I don't want to get into that as I don't have
familiarity with all of it, but my first intuition is that markers will
provide granularity at the expense of a lot more "marker code", unless
there is always a common default test-env and extra tests are only required
for the exceptions to the defaults.)

How would the proposed marker scheme categorise a test that uses mocked
infrastructure for AWS batch services?  Consider how much AWS
infrastructure is mocked in a moto server to test batch services, i.e. see
[1,2].  In a real sense, the moto library provides a server with a
container runtime, it's "mocked infrastructure" that helps to "fake
integration" tests.  +1 for a common vocabulary (semantics) for tests and
markers; I'm not a test-expert by a long shot, so what is the best practice
for a test vocabulary and how does it translate into markers?  Does the
Apache Foundation have any kind of manifesto about such things?

[1]
https://github.com/spulec/moto/blob/master/tests/test_batch/test_batch.py
[2] https://github.com/spulec/moto/blob/master/moto/batch/models.py


On Sun, Dec 29, 2019 at 7:48 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> >
> > If I understand correctly, using `pytest -k` might be less work and more
> > generalized than a swag of custom makers, unless it entails a lot of
> > re-naming things.  The work to add markers might be easier if they can be
> > applied to entire classes of tests, although what I've mostly seen with
> > `pytest` is a functional pattern rather than classes in tests.  For more
> > about that, see the note about using pytest fixtures vs. class
> > setup/teardown at https://docs.pytest.org/en/latest/xunit_setup.html
>
>
> I think `pytest -k` is great for ad-hoc/manual execution of only what we
> want. But for automation around running tests (which should be repeatable
> and reproducible by anyone), I think it makes much more sense to keep
> makers in the code.
>
> It's really just a matter where we keep information about how we group
> tests in common categories that we use for test execution.
>
>    1. with pytest -k - we would have to keep the "grouping" as different
>    set of  -k parameters in CI test scripts. This requires following naming
>    conventions for modules or classes or tests. Similar to what Kamil
>    described earlier in the thread: we already use *_system.py module +
>    SystemTest class naming in GCP tests.
>    2. with markers, the grouping is kept in the source code of tests
>    instead. This is a "meta" information that does not force any naming
>    convention on the tests.
>
> I strongly prefer 2. over 1. for test automation.
>
> Some reasoning:
>
>    - It makes it easier to reproduce grouping locally without having to
>    look-up the selection criteria/naming conventions.
>    - It's easier to make automation around it (for example in case of
>    integrations we can easily select cases where "integration" from
>    environment matches the integration marker. For example cassandra
>    integration will be matched by integration("cassandra") marker. With
> naming
>    convention we would have to record somewhere (in the custom -k command)
>    that "cassandra" integration matches (for example) all tests in
>    "tests.cassandra" package, or all tests named TestCassandra or something
>    even more complex. Defining custom marker seems like much more obvious
> and
>    easy to follow.
>    - Naming conventions are sometimes not obvious when you look at the code
>    - as opposed to markers are quite obvious to follow in the code when you
>    add new tests of the same "category".
>    - Last but not least - you can combine different markers together. For
>    example we can have Cassandra (integration) + MySql (backend) tests. So
>    markers are "labels" and you can apply more of them to the same test.
>    Naming convention makes it difficult (or impossible) to combine
> different
>    categories together - You would have to have non-overlapping conventions
>    and as we add more categories it might become impossible. For example if
>    you look at my proposal below  - we will likely have a number of
>    "system(gcp)" and "backend("postgres") tests for tests that are testing
>    System tests for Postgres to BigQuery.
>
> For me, the last reason from the list above is a deal-breaker. I can very
> easily imagine overlapping categories of tests we come up with and markers
> give us great flexibility here.
>
> With regard to "slow" and https://github.com/apache/airflow/pull/6876, it
> > was motivated by one test that uses moto mocking for AWS batch services.
> > In particular, it has a mock batch job that actually runs a container and
> > the user of the mock has no control over how the job transitions from
> > various job states (with associated status).  For example, the `pytest`
> > durations are an order of magnitude longer for this test than all others
> > (see below stdout from a PR branch of mine).  So, during dev-test cycles,
> > once this test is coded and working as expected, it helps to either
> > temporarily mark it with `pytest.mark.skip` or to permanently mark it
> with
> > a custom marker (e.g. `pytest.mark.slow`) and then use the `pytest -m
> 'not
> > slow'` to run all the faster tests.  It's no big deal, I can live without
> > it, it's just a convenience.
> >
>
> With regards to "slow" tests. Maybe the right approach here will be to have
> a different marker. I think "Slow" suggest that there is a "fast" somewhere
> and that we need to know how slow is slow.
>
> As an inspiration - I really like the distinction introduced by Martin
> Fowler:
>
> https://www.martinfowler.com/articles/mocksArentStubs.html#ClassicalAndMockistTesting
> -
> where he distinguishes between different types of "test doubles" (dummy,
> fake, stub, spy, mock). Unfortunately, this terminology is not universally
> accepted, but for the sake of this discussion - assume we follow it, then I
> think the "fast" tests use "stubs, mocks or spies" where the "slow" tests
> you mention use "fakes" (your scripts are really "fakes").
> The "fake" tests are usually much slower. But the "fake" marker might not
> be good name though because it's not universally agreed.
>
> But maybe we can come up with something that indicates the tests that are
> using "fakes" rather than "mocks/stubs/spies" ?  That's much easier to
> decide when to apply such marker.  Any idea? Maybe "nostub" or maybe
> "heavy" or something like that ? Or maybe we can start using "fake"
> terminology in those tests and use "fake" maker for those and simply
> introduce this term in our project.
>
> If we come up with a good proposal here this might be fairly consistent in
> terms of when to run the tests:
>
>    - most tests where everything is mocked/stubbed/spied -> *no marker*
>    - non-integration-dependent tests which are using fakes -> *"fake"* or
>    *"heavy"* or smth else
>    - integration-dependent tests:  use *integration("<INTEGRATION>")*
>    markers - for example integration("cassandra")
>    - system tests (future, pending AIP-4) -> use *system("<SYSTEM>")*
>    markers - for tests that require external services/credentials to
> connect
>    to them, for example system("gcp") or system("aws")
>
> That would be super friendly for both automation and manual execution of
> the tests.
>
> J.
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>


-- 
Darren L. Weber, Ph.D.
http://psdlw.users.sourceforge.net/
http://psdlw.users.sourceforge.net/wordpress/

Reply via email to