I'm including at the top concrete tasks people can take to help identify and reduce flakiness. Read below for more details.
1. Mark slow tests as SLOW and reduce the timeout on the bots to 2 seconds. 2. Look into the cause of the timeouts on HTTP tests, especially on Mac/Windows 3. Look at the actual results off the bots for the non-timeout flaky failures and identify the cause of the flakiness (likely the test itself). 4. Make test_expectations.txt match what's actually happening on the bots (see the flakiness dashboard for tests with incorrect expectations). All the data I use below is from: http://src.chromium.org/viewvc/chrome/trunk/src/webkit/tools/layout_tests/flakiness_dashboard.html On Tue, Sep 8, 2009 at 5:52 PM, David Levin <[email protected]> wrote: > I agree that the chromium buildbot seems to have more flakiness on layout > tests that webkit buildbots. While there is definitely more flakiness, I'm not sure how much more. I think the Chromium bots are primarily more flaky on the HTTP tests. What flakiness there is gets less noticed on the webkit buildbots since they don't close the tree. > Here's two things that may help us to understand this: > 1. It would be nice to save crash logs from OSX into the zip file. For > example, this run > > http://build.chromium.org/buildbot/waterfall/builders/Webkit%20Mac10.5%20(dbg)(2)/builds/3323/steps/webkit_tests/logs/stdio > had a crash and likely generated a crash log at > ~/Library/Logs/CrashReporter/TestShell*.crash which would help point to a > culprit. > +1 This would be very useful. That said, it won't benefit with decreasing flakiness much. Very few of the flaky tests are flaky crashers. They're almost entirely flaky timeouts or failures, even in debug builders. 2. If we suspect that tests may pass if given more time, then increase the > timeout and see if more tests pass but exceed this old timeout (log > something when this happens so we can validate that it is working). > -1 The test dashboard prints the out the amount of time a test takes to run if it takes >1 second. I don't think the timing out tests would pass if we just gave them more time. Specifically, there are tests that always timeout and there are flaky timeout tests. The flaky timeout tests, when they do pass, consistently take less than 10 seconds to run, most of them take less than 1 second. Increasing the test timeout also *considerably* increases how long it takes for the bots to cycle. In fact, I think we should be *decreasing* it to something like 2 seconds. This would actually shave minutes off of the current bot cycle times. We have ~100 tests that are slow, many of which timeout at 20 seconds. We should mark all the slow, but passing tests as SLOW in the test expectations file. This will give them more time to run than the other tests. Then we should bring the timeout down to something like 2 seconds. This will make the bots run a lot faster and distinguish between the tests that timeout versus just taking a long time to pass. > On Tue, Sep 8, 2009 at 5:41 PM, Dirk Pranke <[email protected]> wrote: > >> From what I've poked around at, many of the LayoutTest flaky failures >> are timeout-related. > > While more than half of the flaky tests on Windows and Mac are timeouts, many of them are crashes or failures. You can see this pretty clearly on the layout test dashboard. I'll note that on Linux, a very small percentage of the flakiness is timeouts. Almost all of these timeouts on Windows/Mac are HTTP tests. There is likely one or two causes for all the flakiness with the HTTP tests. There's something in the test harness and web >> server configurations that cause tests to be unpredictably slower. I >> don't think Apple has this problem, and I think that's because they >> use the built in apache instance in OS X, > > We switched away from apache to lighttp because of flakiness it was causing on cygwin (cygwin and apache don't play well together). Maybe it makes sense to use lighttp on Windows and Apache on Mac? I think we should identify the cause of the flakiness on Windows. Fixing that might fix the flakiness on Mac as well and we wouldn't need to support two http servers. > and also because they have a >> very different model for test execution (how we run tests in >> parallel). > > Running tests in parallel did seem to make things a bit more flaky, but not much. I haven't verified this, but I think it probably just magnified existing flakiness by putting higher load on the machine. Linux, the least flaky bot, is the only bot that has 4 cores instead of just 2, which means it runs using more TestShell instances in parallel. --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
