Hi,
I read the entire thread and it's clear that the frustration is
palpable. Let's face it: the system is broken and trying to fix it by
forcing rules and processes on people won't work. The objective of the
automated test system is to prevent the product to become too unstable
during the course of development and promote a smooth incremental
regular development flow. If the system is such that it creates backlog
of changes while waiting for the green to come up, it's acting against
its own stated objective, something really dysfunctional. It's like a
traffic light system that's promoting traffic jams instead of making the
traffic fluid...
Several things strike me as core to the issue:
- The functional test system is itself unstable: we need to fix issues
with dependencies between tests for instance so that we avoid avalanche
effects and point of failure shifting effects.
- The functional test system is network dependent: this leads to
intermittent failures that basically render the current policy
ineffective, in effect suspecting commits that have nothing to do with
the source of the bug so basically getting the wrong people working on
the wrong issue (at worse) or getting people to ignore the nagging mail
(at best). If we want a strict "green/red" policy, we need tests that
are 100% context free, deterministically reproducible within the context
of the same box. We should have a way to test Chandler that is not
dependent on network activity, e.g. have a Cosmo instance running on the
test machine and used for sharing tests (I'm hand waving here heavily if
this is at all possible), stubb network API, etc...
- Test for network activity: we should of course test that our network
functionality do work but that should be another set of tests with its
own policy (TBD).
I propose that we held a meeting on this because I feel we need a
solution soon and the email discussion will just take too long.
Would Thursday afternoon work for most?
Cheers,
- Philippe
Heikki Toivonen wrote:
Andi Vajda wrote:
Even by 'strictly' following the rules, when a failure is intermittent,
you easily get into the situation of a bunch of check-ins having
happened since the possibly bad one. I think Bryan's alternative is an
improvement.
I was just thinking about 100% reproducible cases, or close to 100%.
It can be really hard to figure out which checkin caused a rare
intermittent bug. Reasonably reliable intermittent bugs should be dealt
like 100% reproducible cases. The rare cases we have dealt with by
filing bugs and proceeding otherwise normally.
I don't think it would be a good idea to turn off intermittent tests.
First, when they succeed, it is still providing information that new
code hasn't made that test fail 100% of the time. And it is pretty easy
to check the new logs to see if it is a known intermittent failure.
If you really want to go the way of disabling all intermittent tests
then I am afraid that we'll have to turn off the whole functional test
suite right now, because there are at least two intermittent bugs that
manifest as test timeout and crash.
I have a sort of related question regarding test failures. Should we
stop further tests as soon as we see the first failure? This would
shorten Tinderbox cycle time when there was a problem. What we currently
do is that we run all unit tests, and if those passed, we run all
functional tests (and if those passed, perf boxes run all perf tests).
------------------------------------------------------------------------
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev