Hi,

I read the entire thread and it's clear that the frustration is palpable. Let's face it: the system is broken and trying to fix it by forcing rules and processes on people won't work. The objective of the automated test system is to prevent the product to become too unstable during the course of development and promote a smooth incremental regular development flow. If the system is such that it creates backlog of changes while waiting for the green to come up, it's acting against its own stated objective, something really dysfunctional. It's like a traffic light system that's promoting traffic jams instead of making the traffic fluid...

Several things strike me as core to the issue:
- The functional test system is itself unstable: we need to fix issues with dependencies between tests for instance so that we avoid avalanche effects and point of failure shifting effects. - The functional test system is network dependent: this leads to intermittent failures that basically render the current policy ineffective, in effect suspecting commits that have nothing to do with the source of the bug so basically getting the wrong people working on the wrong issue (at worse) or getting people to ignore the nagging mail (at best). If we want a strict "green/red" policy, we need tests that are 100% context free, deterministically reproducible within the context of the same box. We should have a way to test Chandler that is not dependent on network activity, e.g. have a Cosmo instance running on the test machine and used for sharing tests (I'm hand waving here heavily if this is at all possible), stubb network API, etc... - Test for network activity: we should of course test that our network functionality do work but that should be another set of tests with its own policy (TBD).

I propose that we held a meeting on this because I feel we need a solution soon and the email discussion will just take too long.

Would Thursday afternoon work for most?

Cheers,
- Philippe


Heikki Toivonen wrote:
Andi Vajda wrote:
Even by 'strictly' following the rules, when a failure is intermittent,
you easily get into the situation of a bunch of check-ins having
happened since the possibly bad one. I think Bryan's alternative is an
improvement.

I was just thinking about 100% reproducible cases, or close to 100%.

It can be really hard to figure out which checkin caused a rare
intermittent bug. Reasonably reliable intermittent bugs should be dealt
like 100% reproducible cases. The rare cases we have dealt with by
filing bugs and proceeding otherwise normally.

I don't think it would be a good idea to turn off intermittent tests.
First, when they succeed, it is still providing information that new
code hasn't made that test fail 100% of the time. And it is pretty easy
to check the new logs to see if it is a known intermittent failure.

If you really want to go the way of disabling all intermittent tests
then I am afraid that we'll have to turn off the whole functional test
suite right now, because there are at least two intermittent bugs that
manifest as test timeout and crash.


I have a sort of related question regarding test failures. Should we
stop further tests as soon as we see the first failure? This would
shorten Tinderbox cycle time when there was a problem. What we currently
 do is that we run all unit tests, and if those passed, we run all
functional tests (and if those passed, perf boxes run all perf tests).

------------------------------------------------------------------------

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to