On Mon, Jul 20, 2015 at 6:42 AM, Tim Penhey <tim.pen...@canonical.com>
wrote:

> Hi folks,
>
> Earlier today I was investigating this CRITICAL BLOCKER bug:
>         https://bugs.launchpad.net/juju-core/+bug/1475724


I'll talk about the specific bug to begin with, but there's a much more
important bit further down. If you're pressed for time, skip down to the
point marked "THE IMPORTANT BIT".

But as you can see above, there was not a second update-status before
> the config-changed (but there is one after on both).
>
> Really the test doesn't care one hoot about the update-status hook, all
> it really cared about checking was the config-changed. [2]
>

I agree that all that test specifically needs to do is wait for the
config-changed to happen. But I *also* think that doubled update-status is
somewhat suspicious.

We shouldn't be asserting things we don't care about.
>

...and I agree in principle, but I think it's a bit muddier here. Much of
what the uniter does is run hooks; and it's important that it do so in a
predictable and consistent order. I think it's better to assert the full
list of hooks expected for each uniter FT (and thus have a chance of
spotting surprising macro-level changes) than it is to do things like "wait
for two config-changed hooks and ignore everything else".

(And FWIW you can define whatever set of hooks you want in a uniter FT
*anyway* -- in this particular case, we could define a charm without an
update-status hook. In hindsight, yeah, we probably could/should have done
that in this case -- but I wouldn't want that approach to be the first
resort.)

Here are my thoughts as to what should be changed:
>
>  - most tests should not care about the update-status hook, and it
> shouldn't appear in any of the lists of hooks
>  - the matchHooks method on the context should skip over the
> update-status hook calls
>  - explicit calls should be made if the user really cares about
> update-status being at the end of the list, i.e. don't use matchHooks,
> but add another method to look for the update-status
>
> This should make the uniter tests not be so timing dependent.
>

Limited disagreement with the first two -- update-status timing is
important, and while the FTs are each written to target a particular
scenario, the context in which the actions happen is important. If we want
to validate update-status behaviour properly we can either build the
verification into the test harness, as we do, or we can write a vastly
greater number of tests, with and without update-status, to check how the
features interact in reality.

Honestly, the problem is not that the tests are timing-dependent -- it's
that the *code* is timing-dependent. The initial idle-status implementation
was slapped on top of the alive-loop in response to time pressure; and then
the update-status work was slapped on top of those unsound foundations. One
such decision is understandable in context -- all software is gnarly at the
edges -- but the rot sets in when we build new gnarly layers on top of old
ones.

And re the time pressure argument: I think the amount of time we have
burned on this already outweighs the cost of extracting the uniter
algorithm from the concurrency concerns. I can accept that the advantages
gleaned from landing idle status early were large enough to justify taking
on that technical debt; but the interest rate has ballooned already and we
need to pay it down ASAP.

Aside from all this work, it is becoming REALLY IMPORTANT that we stop
> writing terrible, wasteful, full integration type tests when what we
> really care about testing is some aspect of uniter internals. I know
> that it is just simpler to copy what is there and make more, but it is
> better to write smaller, targeted tests that just test what you are
> wanting to assert.
>
> Yes there should be integration level tests, but not all the unit level
> tests should be tested at an integration level.
>

I definitely agree with this statement.

We need to not only stop writing more of these, but go back and fix the
> ones that are there to be better.
>

...and I partially agree with this statement, but it opens a few
interesting issues.

THE IMPORTANT BIT

(1) We do definitely need *some* hairy full-stack uniter tests, and many of
the existing tests are specifically testing how the whole system reacts in
carefully-crafted edge cases. But, sure, we could stand to lose several of
them, so long as they were replaced with something as (or preferably
*more*) reliable.

(2) The *reason* we need such hairy uniter tests is because we've got
concurrency concerns mixed in with our algorithm. (Shiny-new-hammer
syndrome, dating from very early on -- with select loops and tombs,
concurrency is *easy*! Well... kinda.)

(3) Captain Hindsight helpfully informs us that mixing those concerns makes
it (i) hard to test the concurrency, because you need to take the algorithm
into account and (ii) hard to test the algorithm, because you need to take
the concurrency into account.

(4) I am coming to believe that the best change we can make in service of
greater reliability and repeatability is to habitually separate these
concerns, and thus allow *much* more detailed, adversarial, repeatable
tests of our core logic; and much simpler, cleaner concurrency tests( in
which we construct the worker with a double implementing the interface in
question, and focus entirely on the plumbing/marshalling).

(5) We can't do all that at once; but it's not an unreasonable burden to
accept when modifying an existing worker. We should treat mixed concurrency
concerns problems that need to be fixed, and welcome the burden of
separating and completing the tests, because it'll make juju easier to
understand/develop *and* better at the macro-level.

(6) At its heart, the uniter's job is simple. Ask how the world should
look; see how it really does look; run operations until the latter matches
the former. It's not *simple* -- the set of possible stimuli is not small
-- but nor is it overwhelmingly large, and the list of responses is similar.

(7) I'm not saying it'd be *easy* to extract it, but failing to do so is
now costing us more than it's gaining us. I'd do it myself in my Copious
Free Time, but, lol. I *will* however take as much time as necessary to
fully support anyone who *can* take this on; and I've done quite a lot of
the preliminaries recently anyway, so I think I can point in useful
directions.

In short: yes, let's unit test the uniter properly. To do that we need to
make its core logic test*able*, and we need someone to devote time to
making it so. I consider preventative maintenance to be part of the cost of
writing code -- we should be extracting logic and paying down these debts
as we go, precisely so we don't end up in this situation -- and so ofc I'd
prefer not to schedule time for this explicitly; but if we *don't*, we
incentivise every team to play chicken with each other by building extra
floors on the skyscraper and hoping it isn't their one that causes the
foundations to shift.

Cheers
William

PS If we did this we'd *also* get someone else with a reasonably deep
understanding of the uniter, which would help our bus number metrics no end.

[1] We should seriously start thinking how to gate landings on the unit
> tests passing on amd64, ppc, and windwos.
>

+1.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to