On 12/08/2014 03:15 PM, Nathaniel McCallum wrote:
On Mon, 2014-12-08 at 12:27 +0100, Petr Viktorin wrote:
On 12/05/2014 11:51 PM, Nathaniel McCallum wrote:
So I've been working this week on porting the OTP tests that were
manually coded to the new pytest framework. Now, I'm the first to want
better tooling to make our job easier. But, pytest? Meh. I'm having to
write just as much code (or more) to get my tests on par with pytest,
and they are riddled with subtle bugs.

Hi,
Yeah, IPA runs into specific situations that are very different from how
tests usually work in other projects.

Yeah, this I get.

And you took one of the harder cases.
IPA's existing tests also don't really follow best practices – they
pretty much assume you run the whole test suite, or at least the whole
module, so selecting individual tests is currently not reliable. On the
other hand, this assumption made some things easy to code – like
complicated ordering and setup/teardown.

I think this is pretty much a red herring. It isn't so much about setup
and teardown but about mutually exclusive parameterization. It is a poor
assumption on the part of pytest that the lifecycles of parameters
overlap. At the least, an option should be provided to control this.

One thing we want to do during the move to pytest is to have the tests
be more independent of the current state of the database, and also of
each other.

Sure, this is a fine goal. Unless of course you want to test the
functionality of the code against all possible database states. I think
pretty much every xmlrpc test will have some component of this.

If certain assertions only make sense in a specific order, either put
them in the same test*, or write additional code that explicitly handles
dependencies and ordering. (Sorry, that's the price to pay for isolated
tests – I don't see a way around it.)

In an ideal world, we define a test and all possible parameters to that
test and allow the running of that test independently by choosing from
the proscribed inputs. This all works fine so long as you specify a way
to indicate when inputs are mutually exclusive. Since inputs are single
valued anyway, this should be the default behavior. Unfortunately,
pytest fails on both of these counts.

If on the other hand the order doesn't matter – any order of any subset
of individual tests should pass, which is our goal – but pytest's
default ordering makes the suite slow, there is a hook that allows you
to override the order: pytest_collection_modifyitems [0]. Stick it in
the affected module (or conftest.py file for multiple modules, or a
plugin for the whole suite), and sort the items explicitly.

The problem isn't (I think) ordering but lifecycle. I think that if I
scope the fixture to the function but reorder the items, they will still
be constructed and torn down for every single function iteration. On the
other hand, if I scope the fixture for a longer lifecycle, the
parameters may not be mutually exclusive.

If your test depends on something not existing or being disabled, you
should try removing/disabling it before the test, or write a fixture to
do that.

That's not the problem. The problem is that I can't define parameters
for a single object which sets the same variable to different values.
Such an act means the parameters are mutually exclusive.

OK, I see the problem now. Pytest fixtures are meant for resources, not states of a shared object. You ran into a fairly exotic use case (which is what I feared, so I wanted to look into porting this.) I think in this case we should use using separate users for the enablement states, since user creation is not expensive. Or am I missing something else?

* might make sense, for starters?
[0]
http://pytest.org/latest/plugins.html#_pytest.hookspec.pytest_collection_modifyitems


Most painful, is pytest.fixture(). It is a cool idea, but the scoping
and lifecycle is horribly broken. Here is an example:

Here is an example. I want to create a user, enable otp in two different
ways, test with two different types of tokens under two different
authentication scenarios. This is a total of 8 tests.

@pytest.fixture(scope="function") # This scope is default.
def user(request):
    user = create_user('tuser1')
    request.addfinalizer(lambda: delete_user('tuser1'))
    return user

@pytest.fixture(params=["per-user", "global"])
def enablement(request, user):
    output = enable(user, request.param)
    request.addfinalizer(lambda: disable(user))
    return output

@pytest.fixture(params=[{'type': 'HOTP'}, {'disabled': True}])
def token(request, user, enablement):
    output = create_token(request.param)
    request.addfinalizer(lambda: delete_token(output.id))
    return output

@pytest.mark.parametrize("opts", [foo, bar])
def test_authentication(user, token, opts):
    return auth(user.uid, token.code(), opts)

Because the default scope is "function", a new user is created for every
single run of token. This is *way* more work than necessary, and leads
to slow runtimes. Fortunately, we can fix this by changing the scope of
the user fixture to "module". In this case, only one user will be
created per module. So far so good.

The enablement scope is still "function", so it will still be called
eight times (#tokens * #enablements * #opts). Now, this is a problem
because (let's say) enablement takes a long time to switch tests. Here
is problem #1 with pytest: it presumes that all variants of fixtures
should be grouped by test parameters. This is a bad assumption. It may
make sense in some cases, but there is no way to change it.

Now, we can work around this problem just like with the user fixture:
change the scope to "module". Unfortunately, we've just caused a second
bug. Now, enablement() will only be run twice; but the finalizer will
not be called until all tests are run. This means that "per-user"
enablement will still be active when "global" enablement runs; both will
be torn down simultaneously.

This same problem arises for the token fixture: eight tokens will be
created and destroyed when only four are necessary. You guessed it, we
can fix this by changing the scope. But by doing this all four tokens
will be deleted simultaneously. This means we cannot test the state
where only a disabled token exists.

Any attempt to use a fixture to create long-lived states (precisely what
fixtures are designed for) with mutually exclusive parameters is
extremely error prone as it is not clear which items are alive in the
LDAP db at any point. This leads to both long runtimes *and* extremely
subtle bugs in the tests that are multiplied by every layer of fixtures.
Call me nonplussed.

Now, let's look at how this same test can be implemented without pytest:

user = create_user('tuser1')
try:
    for enablement in ('per-user', 'global'):
      enable(user, enablement)
      try:
        for token in [{'type': 'HOTP'}, {'disabled': True}]:
          object = create_token(token)
          try:
            for opt in [foo, bar]:
              auth(user.uid, object.code(), opt)
          finally:
            delete_token(object.id)
      finally:
        disable(user)
finally:
    delete_user('tuser1')

Comparing the non-pytest implementation to the pytest one we can see a
few things. First, the pytest code has some nice separation of roles.
This is the big benefit of pytest. Second, the non-pytest implementation
is less code. Third, the non-pytest implementation has no subtle scoping
bugs: we know exactly when each database record is created and
destroyed.

This "non-pytest" implementation is a *single test*, since it hard-codes
dependencies and ordering. If you write it as as a single test function
in pytest, it should look much the same, and get the
ordering/dependencies right (except the case of the test user already
existing).

Which conflicts with the desired granularity; which is the whole reason
we are moving away from this model. I should be able to define, as
fixtures, a set of states required to run any one of a set of test. Then
I can choose just to run a single iteration of a set of tests and the
set of states should be assembled correctly. The problem is that, with
pytest, this all works unless the parameterized fixtures contain
mutually exclusive state values.

So, I'm obviously not proposing that we do something like the non-pytest
implementation. What I'm pointing out is that being forced to implement
it this way in pytest negates all the benefits of pytest.

I *want* to like pytest. Honestly I do. But this scoping idiocy has
already cost me roughly two days tracking down subtle bugs in the tests
(while not finding any bugs in the OTP code itself). And I can't think
that OTP tests written using pytest will be maintainable in any regard;
not when such small changes can cause such hard to find bugs.

Now, it may be possible that I have just missed something obvious. If I
have, forgive me this rant. But I've spent a lot of time reading docs
and I haven't seen any obvious solutions to these problems.

Nathaniel

--
Petr³

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to