comments inline..
(Sorry John, l missed your original email which I fear I might have
deleted as part of purging spam email over the weekend)
On Feb 13, 2006, at 3:51 PM, Heikki Toivonen wrote:
John sent this privately first, but since this seems useful to general
discussion I am posting all of his email (along with my comments) with
his permission.
John Anderson wrote:
The other day I wrote a new functional test. When I added it to
the rest
of the tests and ran them, mine passed but the NewCollection test
failed. Even when I ran without my test, NewCollection failed. I
mentioned this to Heikki, who said that functional tests were
passing on
tinderbox. He asked me if I could look into why it was failing on
my box.
I spent most of yesterday looking into this, but didn't find the
bug. It
turned out to be very complicated. The test randomly fails in
different
ways and some times doesn't fail at all. However I learned a lot
which
I'd like to share with you, some of which might explains why
tinderbox
sometimes doesn't report failures.
Dan noticed my new test and pointed out a mistake I made. I needed to
catch exceptions if I wanted them to be reported as errors. I had
incorrectly assumed that exceptions would be caught automatically and
logged as failures.
Yeah, this is bad, and the current way we do it is more like a
workaround. Something better would be nice.
This got me thinking. How does tinderbox and the functional test
framework handle Chandler crashes, python exceptions that happen
before
or after the functional tests, C exceptions, or hangs?
Not sure.
We used to have problems on Windows with unit tests that would cause a
python crash, which in turn would pop up a dialog that would require a
user to OK before the tests would continue. I think nowadays the
dialogs
may come up but the tests still continue.
John, the functional test framework does catch python exceptions
thrown by the test cases and report failures. We haven't figured out
a way to handle crashes and C exceptions.
As it turns out my new functional test runs the skins menu, which has
been broken for months. When I recently hooked it up, it caused
Chandler
to crash on mac (widgets accessed a deallocated pointer). This
menu has
been one of the best ways of testing the framework on which
Chandler is
built, and often finds newly introduced bugs that otherwise would be
very hard to find. So catching a Python exception in my functional
test
wouldn't have caught the mac bug.
Whoever launches Chandler (tinderbox or some test harness) is
going to
have to detect seg faults, etc. Is this happening today?
I am 80% sure we detect test failure, but I am not completely sure.
Since the test case will have not run to completion, CATS will report
the testcase as a failure. We are not explicitly catching the seg
fault though.
Chandler catches uncaught python exceptions, which get lost in the
release versions and in the debug version of Chandler are
displayed in a
dialog along with the anything else written to stderr. This probably
isn't what you want when running functional tests. I think the best
solution for this problem is to have whoever runs Chandler to run the
functional tests include the --nocatch and --stderr arguments, log
the
stderr output, which if not empty, causes the funtional test to
fail. Is
this happening today?
Not as far as I know.
No, this is not happening today.
If we did catch exceptions in this way we wouldn't need add try/catch
blocks like Dan did for my functional test.
It's very important to run the functional tests on both release and
debug versions of Chandler since the debug versions contain lots of
extra testing code. Are we doing this?
Release versions only at the moment. The functional tests are run
by the
perf tinderboxes which only run tests in release mode.
I also noticed some time ago that the TestLaunchChandler test
fails on
windows (it hangs when trying to quit). Could this be the same
problem
as bug #4773 <https://bugzilla.osafoundation.org/show_bug.cgi?
id=4773>,
which we can't reproduce? I looked into it and it turns out to be a C
thread deadlock in repository quit, which is consistent with bug
#4773.
This would be a really great bug to fix. Does tinderbox (or the test
framework) detect functional tests that fail because they hang?
If a test hangs, there is no automatic recovery or reporting. You can
see on the Tinderbox page when a machine has stopped sending reports
which is likely an indication of a hang. We have a bug to recover from
hangs, but at the moment even the only theoretical approach I know of
does not work on Windows. See
https://bugzilla.osafoundation.org/show_bug.cgi?id=5152
Hmm, possibly the script that starts Chandler itself could also kill a
stuck Chandler process.
I asked Bear why TestLaunchChandler wasn't failing on tinderbox
and he
said it wasn't being run. I don't remember the exact reason he
gave for
why it couldn't be run, but I think it was something about tinderbox
machine requiring a display and adding a display was a security
problem.
Since TestLaunchChandler just launches Chandler and the functional
tests
also need to launch Chandler, why we can run functional tests but not
TestLaunchChandler?
AFAIK it wasn't a security problem, we just couldn't get it to
work. We
run the performance boxes differently which is why it works there.
Regular boxes can be controlled simply via ssh access, the perf boxes
need physical presence or VNC to control the tests.
It seems we should be able to switch to VNC control with the regular
Tbox clients as well. The most important ones would be the quick build
boxes. If that works, we can make those run functional tests in both
debug and release mode, which will also allow us to stop running
them on
the perf Tboxes.
TestLaunchChandler was suppose to be a smoke test of Chandler --
to make
sure it doesn't crash. However, it doesn't run the launchers, so it
won't detect any problems with them. We should switch over to running
all our functional tests with the launchers.
Yes, we should. I'll file a bug on that.
Finally, I noticed that TestNewCollection, when run alone fails by
hanging forever, but when run in the order of the functional tests,
fails with an attribute error. I was surprised to learn that each
functional test isn't run from the same known starting point, e.g.
they
are all run one after another in the same execution of Chandler. For
someone like me, who just wants to add a new functional test, it
would
be convenient if my test results didn't depend on the random state of
Chandler left over after the tests that ran before it --
especially when
some of the test access network resources.
We haven't seen too many problems with running all the tests together
in the test suite so far. More often than not it has discovered real
problems. We do have the flexibility of running the tests
individually on a new repo if we have to.
But I am open to the idea of having a functional test suite that runs
each test on a new repo copied over if that helps the developers
debugging their tests.
So that's my brain dump after a day of tracking down functional test
failures. Let me know if I misunderstand any of the issues. I'm
hoping
that we can get the functional tests working in tinderbox soon.
I'm not
sure how best to proceed and I'm open to suggestions. Perhaps
Aparna or
Heikki could file and assign some bugs, if they don't already
exist. If
anybody needs my help fixing any of these problems let me know.
--
Heikki Toivonen
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev