Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On 07/06/2011 07:24 PM, Eric Seidel wrote: On Wed, Jul 6, 2011 at 10:18 AM, Xan Lopezx...@gnome.org wrote: On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidele...@webkit.org wrote: NRWT uses both! It will read in all the port's Skipped files, covert them to SKIP text_expectations, and add them to your test_expectations file. http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309 For better or worse, NRWT will error out, if you have duplicates in your test_expectations file, including duplicates between your test_expectations file and your Skipped lists. Right, this is what I meant in another email when I said you are not supposed to use both. Cannot really see a sane use case for this to be honest. When I transitioned I basically converted Skipped locally to the new format, got tons of duplicated errors, figured out what was going on and deleted then deleted Skipped. Maybe this is done so that you can leave Skipped as it is and start gradually adding stuff to the new file? This was done to make it possible to bring up NRWT on Mac over a year ago. :) I'm happy to look at moving to a different configuration now that the project has (mostly) moved to NRWT. So long term the best is to move from Skipped to text_expectations. But I worry about the lack of the cascading logic. At some point we decided that we need it in the old system. Why do we think that we won't need it with NRWT? I think the cascading reduce the cost of maintaining the skipped lists. WebKit2 is the best example. We have a common skip list that lists all the tests that are failing due to a common WebKit2 specific reason. In that way, I can skip tests that appearing when I work and Apple folks are sleeping and they don't need to worry about that and the same is true in the reverse direction. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
I do not know the history as to why Chromium removed support for test_expectations cascading. Ideally we would have fewer test expectations, not more in the future. :) On Thu, Jul 7, 2011 at 3:16 AM, Balazs Kelemen kbal...@webkit.org wrote: On 07/06/2011 07:24 PM, Eric Seidel wrote: On Wed, Jul 6, 2011 at 10:18 AM, Xan Lopezx...@gnome.org wrote: On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidele...@webkit.org wrote: NRWT uses both! It will read in all the port's Skipped files, covert them to SKIP text_expectations, and add them to your test_expectations file. http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309 For better or worse, NRWT will error out, if you have duplicates in your test_expectations file, including duplicates between your test_expectations file and your Skipped lists. Right, this is what I meant in another email when I said you are not supposed to use both. Cannot really see a sane use case for this to be honest. When I transitioned I basically converted Skipped locally to the new format, got tons of duplicated errors, figured out what was going on and deleted then deleted Skipped. Maybe this is done so that you can leave Skipped as it is and start gradually adding stuff to the new file? This was done to make it possible to bring up NRWT on Mac over a year ago. :) I'm happy to look at moving to a different configuration now that the project has (mostly) moved to NRWT. So long term the best is to move from Skipped to text_expectations. But I worry about the lack of the cascading logic. At some point we decided that we need it in the old system. Why do we think that we won't need it with NRWT? I think the cascading reduce the cost of maintaining the skipped lists. WebKit2 is the best example. We have a common skip list that lists all the tests that are failing due to a common WebKit2 specific reason. In that way, I can skip tests that appearing when I work and Apple folks are sleeping and they don't need to worry about that and the same is true in the reverse direction. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
One difference with the chromium port is that we try to use a single test_expectations.txt that covers all platforms and OS versions (win xp, vista, 7, mac leopard, snow leopard, linux 32, 64, GPU vs CPU, Debug vs Release). The tokens to the left of the test name specify what configuration the expectation applies to. Because of that, there hasn't been much need for multiple test_expectations.txt files. There is some code already in NRWT for cascading test_expectations.txt. Currently, it's specific to the chromium port where we merge the test_expectations.txt in the webkit repo with a test_expectations.txt file in the chromium repo (it just concatenates them together). It would be pretty straight forward to make this code generic for all ports. It seems like we have a few options. We could have a separate test_expectations.txt per layout test platform directory and have cascade logic hard coded into NRWT or with an #include directive. At the other extreme, we could have a single monolithic test_expectations.txt file that knows about all platforms. Or something in the middle: have a test_expectations.txt for mac, mac-leopard, mac-snowleopard, one for qt*, one for all the WebKit2 ports, etc. I suspect we'll want to go with something in the middle. On Thu, Jul 7, 2011 at 10:06 AM, Maciej Stachowiak m...@apple.com wrote: On Jul 7, 2011, at 10:03 AM, Eric Seidel wrote: I do not know the history as to why Chromium removed support for test_expectations cascading. Ideally we would have fewer test expectations, not more in the future. :) The cascading is really really useful for supporting multiple different Mac OS X versions with different results, and WebKit2 as an orthogonal dimension. Perhaps one possibility is to have something like an include directive in the expectations file, so the cascading can be defined by the expectations files themselves, rather than hardcoded in scripts. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On Thu, Jul 7, 2011 at 10:27 AM, Tony Chang t...@chromium.org wrote: One difference with the chromium port is that we try to use a single test_expectations.txt that covers all platforms and OS versions (win xp, vista, 7, mac leopard, snow leopard, linux 32, 64, GPU vs CPU, Debug vs Release). The tokens to the left of the test name specify what configuration the expectation applies to. Because of that, there hasn't been much need for multiple test_expectations.txt files. There is some code already in NRWT for cascading test_expectations.txt. Currently, it's specific to the chromium port where we merge the test_expectations.txt in the webkit repo with a test_expectations.txt file in the chromium repo (it just concatenates them together). It would be pretty straight forward to make this code generic for all ports. It seems like we have a few options. We could have a separate test_expectations.txt per layout test platform directory and have cascade logic hard coded into NRWT or with an #include directive. At the other extreme, we could have a single monolithic test_expectations.txt file that knows about all platforms. Or something in the middle: have a test_expectations.txt for mac, mac-leopard, mac-snowleopard, one for qt*, one for all the WebKit2 ports, etc. I suspect we'll want to go with something in the middle. Tony's description is spot-on. The only reason we don't support cascading expectations files is because it wasn't clear to me how we would want things to work (i.e., which of the choices above) and I wasn't able to get much input a few months ago. If there is a consensus, it will be easy to implement, so how do we actually want this to work? -- Dirk On Thu, Jul 7, 2011 at 10:06 AM, Maciej Stachowiak m...@apple.com wrote: On Jul 7, 2011, at 10:03 AM, Eric Seidel wrote: I do not know the history as to why Chromium removed support for test_expectations cascading. Ideally we would have fewer test expectations, not more in the future. :) The cascading is really really useful for supporting multiple different Mac OS X versions with different results, and WebKit2 as an orthogonal dimension. Perhaps one possibility is to have something like an include directive in the expectations file, so the cascading can be defined by the expectations files themselves, rather than hardcoded in scripts. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On Jul 7, 2011, at 10:39 AM, Dirk Pranke wrote: On Thu, Jul 7, 2011 at 10:27 AM, Tony Chang t...@chromium.org wrote: One difference with the chromium port is that we try to use a single test_expectations.txt that covers all platforms and OS versions (win xp, vista, 7, mac leopard, snow leopard, linux 32, 64, GPU vs CPU, Debug vs Release). The tokens to the left of the test name specify what configuration the expectation applies to. Because of that, there hasn't been much need for multiple test_expectations.txt files. There is some code already in NRWT for cascading test_expectations.txt. Currently, it's specific to the chromium port where we merge the test_expectations.txt in the webkit repo with a test_expectations.txt file in the chromium repo (it just concatenates them together). It would be pretty straight forward to make this code generic for all ports. It seems like we have a few options. We could have a separate test_expectations.txt per layout test platform directory and have cascade logic hard coded into NRWT or with an #include directive. At the other extreme, we could have a single monolithic test_expectations.txt file that knows about all platforms. Or something in the middle: have a test_expectations.txt for mac, mac-leopard, mac-snowleopard, one for qt*, one for all the WebKit2 ports, etc. I suspect we'll want to go with something in the middle. Tony's description is spot-on. The only reason we don't support cascading expectations files is because it wasn't clear to me how we would want things to work (i.e., which of the choices above) and I wasn't able to get much input a few months ago. If there is a consensus, it will be easy to implement, so how do we actually want this to work? Out of the different options raised so far, I like the idea of having an include directive. Then ports can decide for themselves how much factoring is appropriate. I think one giant expectations file for all ports is probably too complicated to be manageable, but an include-based setup would let specific ports share one expectations file for many configurations if they wish. (Also and incidentally, I'd suggest renaming the file to TestExpectations or TestExpectations.txt to better match WebKit naming style, but this is a much more trivial issue.) Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
[webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
I'm not sure we've quite figured that out yet. NRWT supports both Skipped lists and test_expectations.txt, which is a more expressive (but also more complex) version of Skipped lists. IMHO, we should wait for the dust to settle on the transition before changing our practices. Adam On Wed, Jul 6, 2011 at 8:53 AM, Adam Roben aro...@apple.com wrote: Now that more and more ports are switching to NRWT, it would be great for someone to explain what the best practices are for dealing with failing and flaky tests. -Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On Jul 6, 2011, at 11:58 AM, Adam Barth wrote: I'm not sure we've quite figured that out yet. NRWT supports both Skipped lists and test_expectations.txt, which is a more expressive (but also more complex) version of Skipped lists. IMHO, we should wait for the dust to settle on the transition before changing our practices. OK. Then I have another question: What should I do to make the Leopard and SnowLeopard bots green, now that they have switched to NRWT? -Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On Wed, Jul 6, 2011 at 9:00 AM, Adam Roben aro...@apple.com wrote: OK. Then I have another question: What should I do to make the Leopard and SnowLeopard bots green, now that they have switched to NRWT? Looking at http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r90458%20(31133)/results.html You can http/tests/cookies/third-party-cookie-relaxing.html and storage/domstorage/localstorage/storagetracker/storage-tracker-7-usage.html are real failures so you can add skip those, rebaseline, or revet changes that caused the failure. For flaky tests, you can add BUG# : http/tests/misc/favicon-loads-with-icon-loading-override.html = TEXT PASS in mac or mac-leopard test_expectations.txt Although flaky tests only make the bots orange. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On Wed, Jul 6, 2011 at 9:00 AM, Adam Roben aro...@apple.com wrote: On Jul 6, 2011, at 11:58 AM, Adam Barth wrote: I'm not sure we've quite figured that out yet. NRWT supports both Skipped lists and test_expectations.txt, which is a more expressive (but also more complex) version of Skipped lists. IMHO, we should wait for the dust to settle on the transition before changing our practices. OK. Then I have another question: What should I do to make the Leopard and SnowLeopard bots green, now that they have switched to NRWT? Looking at Leopard, there's one jscore-test issue and two run-webkit-test issues: http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r90460%20(33799)/results.html One of the run-webkit-test issues seems to relate to -0. It's possible that's a test harness issue, but it seems more likely that it's a regression in JavaScriptCore. The other is some SVG foreign object issue, which I have less insight into. We can either fix the regressions, update the expected.txt files, or add the test to the Skipped list, as before. On SnowLeopard, there are also two failing tests, but I believe these tests are related to the NRWT transition: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r90458%20(31133)/results.html The third-party-cookie-relaxing.html test probably needs to be changed because it depends on the persistent state of the cookie jar. Either Eric or I will dig into the test to figure out how to make it more robust. The storagetracker tests also have similar issues. They appear to be flaky with NRWT, which is https://bugs.webkit.org/show_bug.cgi?id=57799. These tests are also flaky under ORWT (but less so): https://bugs.webkit.org/show_bug.cgi?id=58835 https://bugs.webkit.org/show_bug.cgi?id=58836 We need to fix both issues, but they didn't seem like issues that should block the change to NRWT. That's a somewhat round-about way of not answering your question. :) Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidel e...@webkit.org wrote: NRWT uses both! It will read in all the port's Skipped files, covert them to SKIP text_expectations, and add them to your test_expectations file. http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309 For better or worse, NRWT will error out, if you have duplicates in your test_expectations file, including duplicates between your test_expectations file and your Skipped lists. Right, this is what I meant in another email when I said you are not supposed to use both. Cannot really see a sane use case for this to be honest. When I transitioned I basically converted Skipped locally to the new format, got tons of duplicated errors, figured out what was going on and deleted then deleted Skipped. Maybe this is done so that you can leave Skipped as it is and start gradually adding stuff to the new file? Xan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Best practices for failing a flaky tests (was Re: Switching to new-run-webkit-tests)
On Wed, Jul 6, 2011 at 10:18 AM, Xan Lopez x...@gnome.org wrote: On Wed, Jul 6, 2011 at 6:29 PM, Eric Seidel e...@webkit.org wrote: NRWT uses both! It will read in all the port's Skipped files, covert them to SKIP text_expectations, and add them to your test_expectations file. http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/webkit.py#L309 For better or worse, NRWT will error out, if you have duplicates in your test_expectations file, including duplicates between your test_expectations file and your Skipped lists. Right, this is what I meant in another email when I said you are not supposed to use both. Cannot really see a sane use case for this to be honest. When I transitioned I basically converted Skipped locally to the new format, got tons of duplicated errors, figured out what was going on and deleted then deleted Skipped. Maybe this is done so that you can leave Skipped as it is and start gradually adding stuff to the new file? This was done to make it possible to bring up NRWT on Mac over a year ago. :) I'm happy to look at moving to a different configuration now that the project has (mostly) moved to NRWT. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev