I'm including at the top concrete tasks people can take to help identify and
reduce flakiness. Read below for more details.

   1. Mark slow tests as SLOW and reduce the timeout on the bots to 2
   seconds.
   2. Look into the cause of the timeouts on HTTP tests, especially on
   Mac/Windows
   3. Look at the actual results off the bots for the non-timeout flaky
   failures and identify the cause of the flakiness (likely the test itself).
   4. Make test_expectations.txt match what's actually happening on the bots
   (see the flakiness dashboard for tests with incorrect expectations).

All the data I use below is from:
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/tools/layout_tests/flakiness_dashboard.html

On Tue, Sep 8, 2009 at 5:52 PM, David Levin <[email protected]> wrote:

> I agree that the chromium buildbot seems to have more flakiness on layout
> tests that webkit buildbots.


While there is definitely more flakiness, I'm not sure how much more. I
think the Chromium bots are primarily more flaky on the HTTP tests. What
flakiness there is gets less noticed on the webkit buildbots since they
don't close the tree.


> Here's two things that may help us to understand this:
> 1. It would be nice to save crash logs from OSX into the zip file. For
> example, this run
>
> http://build.chromium.org/buildbot/waterfall/builders/Webkit%20Mac10.5%20(dbg)(2)/builds/3323/steps/webkit_tests/logs/stdio
> had a crash and likely generated a crash log at
> ~/Library/Logs/CrashReporter/TestShell*.crash which would help point to a
> culprit.
>

+1 This would be very useful. That said, it won't benefit with decreasing
flakiness much. Very few of the flaky tests are flaky crashers. They're
almost entirely flaky timeouts or failures, even in debug builders.

2. If we suspect that tests may pass if given more time, then increase the
> timeout and see if more tests pass but exceed this old timeout (log
> something when this happens so we can validate that it is working).
>

-1 The test dashboard prints the out the amount of time a test takes to run
if it takes >1 second. I don't think the timing out tests would pass if we
just gave them more time. Specifically, there are tests that always timeout
and there are flaky timeout tests. The flaky timeout tests, when they do
pass, consistently take less than 10 seconds to run, most of them take less
than 1 second.

Increasing the test timeout also *considerably* increases how long it takes
for the bots to cycle. In fact, I think we should be *decreasing* it to
something like 2 seconds. This would actually shave minutes off of the
current bot cycle times.

We have ~100 tests that are slow, many of which timeout at 20 seconds. We
should mark all the slow, but passing tests as SLOW in the test expectations
file. This will give them more time to run than the other tests. Then we
should bring the timeout down to something like 2 seconds. This will make
the bots run a lot faster and distinguish between the tests that timeout
versus just taking a long time to pass.


> On Tue, Sep 8, 2009 at 5:41 PM, Dirk Pranke <[email protected]> wrote:
>
>> From what I've poked around at, many of the LayoutTest flaky failures
>> are timeout-related.
>
>
While more than half of the flaky tests on Windows and Mac are timeouts,
many of them are crashes or failures. You can see this pretty clearly on the
layout test dashboard. I'll note that on Linux, a very small percentage of
the flakiness is timeouts. Almost all of these timeouts on Windows/Mac are
HTTP tests. There is likely one or two causes for all the flakiness with the
HTTP tests.

There's something in the test harness and web
>> server configurations that cause tests to be unpredictably slower. I
>> don't think Apple has this problem, and I think that's because they
>> use the built in apache instance in OS X,
>
>
We switched away from apache to lighttp because of flakiness it was causing
on cygwin (cygwin and apache don't play well together). Maybe it makes sense
to use lighttp on Windows and Apache on Mac? I think we should identify the
cause of the flakiness on Windows. Fixing that might fix the flakiness on
Mac as well and we wouldn't need to support two http servers.


> and also because they have a
>> very different model for test execution (how we run tests in
>> parallel).
>
>
Running tests in parallel did seem to make things a bit more flaky, but not
much. I haven't verified this, but I think it probably just magnified
existing flakiness by putting higher load on the machine. Linux, the least
flaky bot, is the only bot that has 4 cores instead of just 2, which means
it runs using more TestShell instances in parallel.

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to