Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-07 Thread Balazs Kelemen

On 07/06/2011 07:24 PM, Eric Seidel wrote:

On Wed, Jul 6, 2011 at 10:18 AM, Xan Lopezx...@gnome.org  wrote:

On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidele...@webkit.org  wrote:

NRWT uses both!  It will read in all the port's Skipped files, covert
them to SKIP text_expectations, and add them to your test_expectations
file.
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309

For better or worse, NRWT will error out, if you have duplicates in
your test_expectations file, including duplicates between your
test_expectations file and your Skipped lists.

Right, this is what I meant in another email when I said you are not
supposed to use both. Cannot really see a sane use case for this to be
honest. When I transitioned I basically converted Skipped locally to
the new format, got tons of duplicated errors, figured out what was
going on and deleted then deleted Skipped. Maybe this is done so that
you can leave Skipped as it is and start gradually adding stuff to the
new file?

This was done to make it possible to bring up NRWT on Mac over a year
ago. :)  I'm happy to look at moving to a different configuration now
that the project has (mostly) moved to NRWT.
So long term the best is to move from Skipped to text_expectations. But 
I worry about the lack of the cascading logic. At some point we decided 
that we need it in the old system. Why do we think that we won't need it 
with NRWT? I think the cascading reduce the cost of maintaining the 
skipped lists. WebKit2 is the best example. We have a common skip list 
that lists all the tests that are failing due to a common WebKit2 
specific reason. In that way, I can skip tests that appearing when I 
work and Apple folks are sleeping and they don't need to worry about 
that and the same is true in the reverse direction.

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-07 Thread Eric Seidel
I do not know the history as to why Chromium removed support for
test_expectations cascading.

Ideally we would have fewer test expectations, not more in the future. :)

On Thu, Jul 7, 2011 at 3:16 AM, Balazs Kelemen kbal...@webkit.org wrote:
 On 07/06/2011 07:24 PM, Eric Seidel wrote:

 On Wed, Jul 6, 2011 at 10:18 AM, Xan Lopezx...@gnome.org  wrote:

 On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidele...@webkit.org  wrote:

 NRWT uses both!  It will read in all the port's Skipped files, covert
 them to SKIP text_expectations, and add them to your test_expectations
 file.

 http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309

 For better or worse, NRWT will error out, if you have duplicates in
 your test_expectations file, including duplicates between your
 test_expectations file and your Skipped lists.

 Right, this is what I meant in another email when I said you are not
 supposed to use both. Cannot really see a sane use case for this to be
 honest. When I transitioned I basically converted Skipped locally to
 the new format, got tons of duplicated errors, figured out what was
 going on and deleted then deleted Skipped. Maybe this is done so that
 you can leave Skipped as it is and start gradually adding stuff to the
 new file?

 This was done to make it possible to bring up NRWT on Mac over a year
 ago. :)  I'm happy to look at moving to a different configuration now
 that the project has (mostly) moved to NRWT.

 So long term the best is to move from Skipped to text_expectations. But I
 worry about the lack of the cascading logic. At some point we decided that
 we need it in the old system. Why do we think that we won't need it with
 NRWT? I think the cascading reduce the cost of maintaining the skipped
 lists. WebKit2 is the best example. We have a common skip list that lists
 all the tests that are failing due to a common WebKit2 specific reason. In
 that way, I can skip tests that appearing when I work and Apple folks are
 sleeping and they don't need to worry about that and the same is true in the
 reverse direction.
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-07 Thread Tony Chang
One difference with the chromium port is that we try to use a single
test_expectations.txt that covers all platforms and OS versions (win xp,
vista, 7, mac leopard, snow leopard, linux 32, 64, GPU vs CPU, Debug vs
Release).  The tokens to the left of the test name specify what
configuration the expectation applies to.  Because of that, there hasn't
been much need for multiple test_expectations.txt files.

There is some code already in NRWT for cascading test_expectations.txt.
 Currently, it's specific to the chromium port where we merge
the test_expectations.txt in the webkit repo with a test_expectations.txt
file in the chromium repo (it just concatenates them together).  It would be
pretty straight forward to make this code generic for all ports.

It seems like we have a few options.  We could have a separate
test_expectations.txt per layout test platform directory and have cascade
logic hard coded into NRWT or with an #include directive.  At the other
extreme, we could have a single monolithic test_expectations.txt file that
knows about all platforms.  Or something in the middle: have a
test_expectations.txt for mac, mac-leopard, mac-snowleopard, one for qt*,
one for all the WebKit2 ports, etc.  I suspect we'll want to go with
something in the middle.

On Thu, Jul 7, 2011 at 10:06 AM, Maciej Stachowiak m...@apple.com wrote:


 On Jul 7, 2011, at 10:03 AM, Eric Seidel wrote:

  I do not know the history as to why Chromium removed support for
  test_expectations cascading.
 
  Ideally we would have fewer test expectations, not more in the future. :)

 The cascading is really really useful for supporting multiple different Mac
 OS X versions with different results, and WebKit2 as an orthogonal
 dimension. Perhaps one possibility is to have something like an include
 directive in the expectations file, so the cascading can be defined by the
 expectations files themselves, rather than hardcoded in scripts.

 Regards,
 Maciej

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-07 Thread Dirk Pranke
On Thu, Jul 7, 2011 at 10:27 AM, Tony Chang t...@chromium.org wrote:
 One difference with the chromium port is that we try to use a single
 test_expectations.txt that covers all platforms and OS versions (win xp,
 vista, 7, mac leopard, snow leopard, linux 32, 64, GPU vs CPU, Debug vs
 Release).  The tokens to the left of the test name specify what
 configuration the expectation applies to.  Because of that, there hasn't
 been much need for multiple test_expectations.txt files.
 There is some code already in NRWT for cascading test_expectations.txt.
  Currently, it's specific to the chromium port where we merge
 the test_expectations.txt in the webkit repo with a test_expectations.txt
 file in the chromium repo (it just concatenates them together).  It would be
 pretty straight forward to make this code generic for all ports.
 It seems like we have a few options.  We could have a separate
 test_expectations.txt per layout test platform directory and have cascade
 logic hard coded into NRWT or with an #include directive.  At the other
 extreme, we could have a single monolithic test_expectations.txt file that
 knows about all platforms.  Or something in the middle: have a
 test_expectations.txt for mac, mac-leopard, mac-snowleopard, one for qt*,
 one for all the WebKit2 ports, etc.  I suspect we'll want to go with
 something in the middle.

Tony's description is spot-on. The only reason we don't support
cascading expectations files is because it wasn't clear to me how we
would want things to work (i.e., which of the choices above) and I
wasn't able to get much input a few months ago.

If there is a consensus, it will be easy to implement, so how do we
actually want this to work?

-- Dirk

 On Thu, Jul 7, 2011 at 10:06 AM, Maciej Stachowiak m...@apple.com wrote:

 On Jul 7, 2011, at 10:03 AM, Eric Seidel wrote:

  I do not know the history as to why Chromium removed support for
  test_expectations cascading.
 
  Ideally we would have fewer test expectations, not more in the future.
  :)

 The cascading is really really useful for supporting multiple different
 Mac OS X versions with different results, and WebKit2 as an orthogonal
 dimension. Perhaps one possibility is to have something like an include
 directive in the expectations file, so the cascading can be defined by the
 expectations files themselves, rather than hardcoded in scripts.

 Regards,
 Maciej

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-07 Thread Maciej Stachowiak

On Jul 7, 2011, at 10:39 AM, Dirk Pranke wrote:

 On Thu, Jul 7, 2011 at 10:27 AM, Tony Chang t...@chromium.org wrote:
 One difference with the chromium port is that we try to use a single
 test_expectations.txt that covers all platforms and OS versions (win xp,
 vista, 7, mac leopard, snow leopard, linux 32, 64, GPU vs CPU, Debug vs
 Release).  The tokens to the left of the test name specify what
 configuration the expectation applies to.  Because of that, there hasn't
 been much need for multiple test_expectations.txt files.
 There is some code already in NRWT for cascading test_expectations.txt.
  Currently, it's specific to the chromium port where we merge
 the test_expectations.txt in the webkit repo with a test_expectations.txt
 file in the chromium repo (it just concatenates them together).  It would be
 pretty straight forward to make this code generic for all ports.
 It seems like we have a few options.  We could have a separate
 test_expectations.txt per layout test platform directory and have cascade
 logic hard coded into NRWT or with an #include directive.  At the other
 extreme, we could have a single monolithic test_expectations.txt file that
 knows about all platforms.  Or something in the middle: have a
 test_expectations.txt for mac, mac-leopard, mac-snowleopard, one for qt*,
 one for all the WebKit2 ports, etc.  I suspect we'll want to go with
 something in the middle.
 
 Tony's description is spot-on. The only reason we don't support
 cascading expectations files is because it wasn't clear to me how we
 would want things to work (i.e., which of the choices above) and I
 wasn't able to get much input a few months ago.
 
 If there is a consensus, it will be easy to implement, so how do we
 actually want this to work?

Out of the different options raised so far, I like the idea of having an 
include directive. Then ports can decide for themselves how much factoring is 
appropriate.

I think one giant expectations file for all ports is probably too complicated 
to be manageable, but an include-based setup would let specific ports share one 
expectations file for many configurations if they wish.

(Also and incidentally, I'd suggest renaming the file to TestExpectations or 
TestExpectations.txt to better match WebKit naming style, but this is a much 
more trivial issue.)

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-06 Thread Adam Barth
I'm not sure we've quite figured that out yet.  NRWT supports both
Skipped lists and test_expectations.txt, which is a more expressive
(but also more complex) version of Skipped lists.  IMHO, we should
wait for the dust to settle on the transition before changing our
practices.

Adam


On Wed, Jul 6, 2011 at 8:53 AM, Adam Roben aro...@apple.com wrote:
 Now that more and more ports are switching to NRWT, it would be great for 
 someone to explain what the best practices are for dealing with failing and 
 flaky tests.

 -Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-06 Thread Adam Roben
On Jul 6, 2011, at 11:58 AM, Adam Barth wrote:

 I'm not sure we've quite figured that out yet.  NRWT supports both
 Skipped lists and test_expectations.txt, which is a more expressive
 (but also more complex) version of Skipped lists.  IMHO, we should
 wait for the dust to settle on the transition before changing our
 practices.

OK. Then I have another question:

What should I do to make the Leopard and SnowLeopard bots green, now that they 
have switched to NRWT?

-Adam

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-06 Thread Ryosuke Niwa
On Wed, Jul 6, 2011 at 9:00 AM, Adam Roben aro...@apple.com wrote:

 OK. Then I have another question:

 What should I do to make the Leopard and SnowLeopard bots green, now that
 they have switched to NRWT?


Looking at
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r90458%20(31133)/results.html

You can http/tests/cookies/third-party-cookie-relaxing.html
and storage/domstorage/localstorage/storagetracker/storage-tracker-7-usage.html
are real failures so you can add skip those, rebaseline, or revet changes
that caused the failure.

For flaky tests, you can add
BUG# : http/tests/misc/favicon-loads-with-icon-loading-override.html =
TEXT PASS
in mac or mac-leopard test_expectations.txt

Although flaky tests only make the bots orange.

- Ryosuke
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-06 Thread Adam Barth
On Wed, Jul 6, 2011 at 9:00 AM, Adam Roben aro...@apple.com wrote:
 On Jul 6, 2011, at 11:58 AM, Adam Barth wrote:
 I'm not sure we've quite figured that out yet.  NRWT supports both
 Skipped lists and test_expectations.txt, which is a more expressive
 (but also more complex) version of Skipped lists.  IMHO, we should
 wait for the dust to settle on the transition before changing our
 practices.

 OK. Then I have another question:

 What should I do to make the Leopard and SnowLeopard bots green, now that 
 they have switched to NRWT?

Looking at Leopard, there's one jscore-test issue and two
run-webkit-test issues:

http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r90460%20(33799)/results.html

One of the run-webkit-test issues seems to relate to -0.  It's
possible that's a test harness issue, but it seems more likely that
it's a regression in JavaScriptCore.  The other is some SVG foreign
object issue, which I have less insight into.  We can either fix the
regressions, update the expected.txt files, or add the test to the
Skipped list, as before.

On SnowLeopard, there are also two failing tests, but I believe these
tests are related to the NRWT transition:

http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r90458%20(31133)/results.html

The third-party-cookie-relaxing.html test probably needs to be changed
because it depends on the persistent state of the cookie jar.  Either
Eric or I will dig into the test to figure out how to make it more
robust.

The storagetracker tests also have similar issues.  They appear to be
flaky with NRWT, which is
https://bugs.webkit.org/show_bug.cgi?id=57799.  These tests are also
flaky under ORWT (but less so):

https://bugs.webkit.org/show_bug.cgi?id=58835
https://bugs.webkit.org/show_bug.cgi?id=58836

We need to fix both issues, but they didn't seem like issues that
should block the change to NRWT.

That's a somewhat round-about way of not answering your question.  :)

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-06 Thread Xan Lopez
On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidel e...@webkit.org wrote:
 NRWT uses both!  It will read in all the port's Skipped files, covert
 them to SKIP text_expectations, and add them to your test_expectations
 file.
 http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309

 For better or worse, NRWT will error out, if you have duplicates in
 your test_expectations file, including duplicates between your
 test_expectations file and your Skipped lists.

Right, this is what I meant in another email when I said you are not
supposed to use both. Cannot really see a sane use case for this to be
honest. When I transitioned I basically converted Skipped locally to
the new format, got tons of duplicated errors, figured out what was
going on and deleted then deleted Skipped. Maybe this is done so that
you can leave Skipped as it is and start gradually adding stuff to the
new file?

Xan
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)

2011-07-06 Thread Eric Seidel
On Wed, Jul 6, 2011 at 10:18 AM, Xan Lopez x...@gnome.org wrote:
 On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidel e...@webkit.org wrote:
 NRWT uses both!  It will read in all the port's Skipped files, covert
 them to SKIP text_expectations, and add them to your test_expectations
 file.
 http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309

 For better or worse, NRWT will error out, if you have duplicates in
 your test_expectations file, including duplicates between your
 test_expectations file and your Skipped lists.

 Right, this is what I meant in another email when I said you are not
 supposed to use both. Cannot really see a sane use case for this to be
 honest. When I transitioned I basically converted Skipped locally to
 the new format, got tons of duplicated errors, figured out what was
 going on and deleted then deleted Skipped. Maybe this is done so that
 you can leave Skipped as it is and start gradually adding stuff to the
 new file?

This was done to make it possible to bring up NRWT on Mac over a year
ago. :)  I'm happy to look at moving to a different configuration now
that the project has (mostly) moved to NRWT.
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev