Re: [Dev] Re: Functional Tests Fail: Tinderbox reports success

Aparna Kadakia Mon, 13 Feb 2006 17:41:44 -0800

comments inline..

(Sorry John, l missed your original email which I fear I might havedeleted as part of purging spam email over the weekend)


On Feb 13, 2006, at 3:51 PM, Heikki Toivonen wrote:

John sent this privately first, but since this seems useful to general
discussion I am posting all of his email (along with my comments) with
his permission.

John Anderson wrote:
The other day I wrote a new functional test. When I added it tothe rest
of the tests and ran them, mine passed but the NewCollection test
failed. Even when I ran without my test, NewCollection failed. I
mentioned this to Heikki, who said that functional tests werepassing ontinderbox. He asked me if I could look into why it was failing onmy box.
I spent most of yesterday looking into this, but didn't find thebug. Itturned out to be very complicated. The test randomly fails indifferentways and some times doesn't fail at all. However I learned a lotwhichI'd like to share with you, some of which might explains whytinderbox
sometimes doesn't report failures.

Dan noticed my new test and pointed out a mistake I made. I needed to
catch exceptions if I wanted them to be reported as errors. I had
incorrectly assumed that exceptions would be caught automatically and
logged as failures.
Yeah, this is bad, and the current way we do it is more like a
workaround. Something better would be nice.
This got me thinking. How does tinderbox and the functional test
framework handle Chandler crashes, python exceptions that happenbefore
or after the functional tests, C exceptions, or hangs?
Not sure.

We used to have problems on Windows with unit tests that would cause a
python crash, which in turn would pop up a dialog that would require a
user to OK before the tests would continue. I think nowadays thedialogs
may come up but the tests still continue.

John, the functional test framework does catch python exceptionsthrown by the test cases and report failures. We haven't figured outa way to handle crashes and C exceptions.

As it turns out my new functional test runs the skins menu, which has
been broken for months. When I recently hooked it up, it causedChandlerto crash on mac (widgets accessed a deallocated pointer). Thismenu hasbeen one of the best ways of testing the framework on whichChandler is
built, and often finds newly introduced bugs that otherwise would be
very hard to find. So catching a Python exception in my functionaltest
wouldn't have caught the mac bug.
Whoever launches Chandler (tinderbox or some test harness) isgoing to
have to detect seg faults, etc. Is this happening today?
I am 80% sure we detect test failure, but I am not completely sure.

Since the test case will have not run to completion, CATS will reportthe testcase as a failure. We are not explicitly catching the segfault though.

Chandler catches uncaught python exceptions, which get lost in the
release versions and in the debug version of Chandler aredisplayed in a
dialog along with the anything else written to stderr. This probably
isn't what you want when running functional tests. I think the best
solution for this problem is to have whoever runs Chandler to run the
functional tests include the --nocatch and --stderr arguments, logthestderr output, which if not empty, causes the funtional test tofail. Is
this happening today?
Not as far as I know.

No, this is not happening today.

If we did catch exceptions in this way we wouldn't need add try/catch
blocks like Dan did for my functional test.

It's very important to run the functional tests on both release and
debug versions of Chandler since the debug versions contain lots of
extra testing code. Are we doing this?
Release versions only at the moment. The functional tests are runby the
perf tinderboxes which only run tests in release mode.
I also noticed some time ago that the TestLaunchChandler testfails onwindows (it hangs when trying to quit). Could this be the sameproblemas bug #4773 <https://bugzilla.osafoundation.org/show_bug.cgi?id=4773>,
which we can't reproduce? I looked into it and it turns out to be a C
thread deadlock in repository quit, which is consistent with bug#4773.
This would be a really great bug to fix. Does tinderbox (or the test
framework) detect functional tests that fail because they hang?
If a test hangs, there is no automatic recovery or reporting. You can
see on the Tinderbox page when a machine has stopped sending reports
which is likely an indication of a hang. We have a bug to recover from
hangs, but at the moment even the only theoretical approach I know of
does not work on Windows. See
https://bugzilla.osafoundation.org/show_bug.cgi?id=5152

Hmm, possibly the script that starts Chandler itself could also kill a
stuck Chandler process.
I asked Bear why TestLaunchChandler wasn't failing on tinderboxand hesaid it wasn't being run. I don't remember the exact reason hegave for
why it couldn't be run, but I think it was something about tinderbox
machine requiring a display and adding a display was a securityproblem.Since TestLaunchChandler just launches Chandler and the functionaltests
also need to launch Chandler, why we can run functional tests but not
TestLaunchChandler?
AFAIK it wasn't a security problem, we just couldn't get it towork. We
run the performance boxes differently which is why it works there.
Regular boxes can be controlled simply via ssh access, the perf boxes
need physical presence or VNC to control the tests.

It seems we should be able to switch to VNC control with the regular
Tbox clients as well. The most important ones would be the quick build
boxes. If that works, we can make those run functional tests in both
debug and release mode, which will also allow us to stop runningthem on
the perf Tboxes.
TestLaunchChandler was suppose to be a smoke test of Chandler --to make
sure it doesn't crash. However, it doesn't run the launchers, so it
won't detect any problems with them. We should switch over to running
all our functional tests with the launchers.
Yes, we should. I'll file a bug on that.
Finally, I noticed that TestNewCollection, when run alone fails by
hanging forever, but when run in the order of the functional tests,
fails with an attribute error. I was surprised to learn that each
functional test isn't run from the same known starting point, e.g.they
are all run one after another in the same execution of Chandler. For
someone like me, who just wants to add a new functional test, itwould
be convenient if my test results didn't depend on the random state of
Chandler left over after the tests that ran before it --especially when
some of the test access network resources.

We haven't seen too many problems with running all the tests togetherin the test suite so far. More often than not it has discovered realproblems. We do have the flexibility of running the testsindividually on a new repo if we have to.But I am open to the idea of having a functional test suite that runseach test on a new repo copied over if that helps the developersdebugging their tests.

So that's my brain dump after a day of tracking down functional test
failures. Let me know if I misunderstand any of the issues. I'mhopingthat we can get the functional tests working in tinderbox soon.I'm notsure how best to proceed and I'm open to suggestions. PerhapsAparna orHeikki could file and assign some bugs, if they don't alreadyexist. If
anybody needs my help fixing any of these problems let me know.


--
  Heikki Toivonen


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

Re: [Dev] Re: Functional Tests Fail: Tinderbox reports success

Reply via email to