Many of our tests have inconsistent results.
Generally speaking, developers only run a narrow subset of tests locally
and defer running the full suite of tests to our automation infrastructure.
Within that infrastructure, we have a database of known "intermittent"
failures and tools that attempt to auto classify known intermittent
failures. People find intermittent failures that fall through the cracks.
In some cases, we outright disable tests that aren't reliable enough. In
other cases we set thresholds as to what their expected failure rate should
be. Unfortunately, we don't have a good answer for reproducing this
classification infrastructure outside of Mozilla :/
Furthermore, there are some tests that are so wonky that reproducing
behavior outside of our automation environment is difficult or impossible.
It is not uncommon to have to "check out" a machine from our automation
environment in order to debug a failure.
If your goal is to qualify pre-release compiler changes against Firefox,
I'd start by focusing on tests that run reliably. Generally speaking,
"headless" tests (tests not rendering a Firefox window) are more reliable.
This includes gtest, xpcshell, and many JS engine tests. Reftests,
crashtests, and web platform tests (WPT) are also pretty isolated and are
generally more consistent. Mochitests tend to have the most inconsistency
from my experience.
This general problem of inconsistent test execution is a complex topic and
has consumed thousands of people hours at Mozilla. If you're intent on
running large parts of Firefox automation, it's probably worth a meeting or
video conference with some of us to go over in more detail.
On Fri, Nov 25, 2016 at 3:36 AM, Gratian Lup <lgrat...@gmail.com> wrote:
> Thanks a lot for your help! I think doing the testing on our machines is a
> better approach overall, even from an engineering perspective - in case of
> a failure you can just grab the machine and start debugging, recompiling,
> etc. The legal work, if possible at all, would be quite complicated too.
> This new approach of testing benefits both us and the tested projects.
> What we are doing:
> - If the failure is in the compiler frontend, it is either a bug or a
> problem in the code itself. There were a few cases when the code had
> problems and we informed and helped those project to solve it - this would
> also be the case with Firefox.
> - If the failure is in the compiler backend/optimizer, fix the bug. This
> is pretty much always a real bug, unless the source code triggers some
> cases of undefined behavior - we didn't find such a case yet.
> I tried running the tests on Windows 7 this time, with the AV and firewall
> disabled, just to be sure they don't interfere. When using the mach
> commands, there are still quite a lot of failures, so I assume that
> something is still not configured right. I was curious if it's different on
> Linux, but tests are also failing there (I used Xubuntu 16.10 with KDE).
> Most failures are in the mochitest, reftest and wpt suites. If it's not
> possible to make all the tests pass, a good enough approach is to establish
> a baseline of tests that are known to fail using a debug/non-optimized
> build and consider a real failure only a failed test not in that set. We
> already do this for a few other large projects and seems like a good
> Is there a way to skip over the tests that are known to fail? With Gtest
> this is easy using the --gtest_filter flag. If it's not possible, then a
> script is needed to parse the results and ignore those known failures.
> Here are some results from the tests running on Windows 7. You can find
> some log files here: https://1drv.ms/f/s!AmGHUgGfTN19hjuFKlOeZ7VtpUsd
> *o Tests without failures:*
> mach gtest
> mach marionette-test
> mach firefox-ui-functional
> mach jsapi-tests
> mach jsbrowsertest
> o Spidermonkey tests
> mach check-spidermonkey doesn't seem to work because js.exe is not built
> by default. To test it I built the js folder and followed the Spidermonkey
> test instructions to run the JS and JIT tests - both pass without failures.
> *o Tests with failures*
> *- mach mochitest*
> Part 1
> 1 INFO Passed: 675002
> 2 INFO Failed: 23
> 3 INFO Todo: 1586
> 4 INFO Mode: e10s
> Part 2
> 1 INFO Passed: 676509
> 2 INFO Failed: 16
> 3 INFO Todo: 1592
> 4 INFO Mode: e10s
> Example of failed test:
> | Test timed out.
> *- mach reftest*
> REFTEST INFO | Successful: 13436 (13416 pass, 20 load only)
> REFTEST INFO | Unexpected: 114 (113 unexpected fail, 0 unexpected pass, 0
> unexpected asserts, 1 failed load, 0 exception)
> REFTEST INFO | Known problems: 700 (285 known fail, 0 known asserts, 69
> random, 346 skipped, 0 slow)
> Example of failed test:
> == file:///c:/firefox/mozilla-central/layout/reftests/bugs/
> 321402-4-ref.xul | image comparison, max difference: 32, number of
> differing pixels: 1
> The images look identical to me, but 1 pixel is supposedly different - I'm
> wondering why the test fails if the max. acceptable diff is 32. Some other
> cases I look into also had "identical" images.
> *- mach web-platform-test*
> *- mach crash-test*
> On Mon, Nov 21, 2016 at 9:54 AM, Gregory Szorc <g...@mozilla.com> wrote:
>> On Wed, Nov 16, 2016 at 12:25 PM, Gratian Lup <lgrat...@gmail.com> wrote:
>>> On Wednesday, November 16, 2016 at 5:23:58 AM UTC-8, Ted Mielczarek
>>> > Gratian,
>>> > One of my coworkers reminded me of something that might be an option
>>> > you--we have scripts that would allow you to provide a Firefox build
>>> > that you generated (at a publicly accessible URL) and trigger test jobs
>>> > on that build in our CI infrastructure. If that's something that sounds
>>> > useful to you we can definitely make that happen.
>>> > You'd have to produce a Firefox build, run the `mach package` and `mach
>>> > package-tests` targets, upload a few zip files from the $objdir/dist
>>> > directory to somewhere accessible via public HTTP and then run a Python
>>> > script to schedule test jobs against those files.
>>> > -Ted
>>> Hi Ted,
>>> Thanks a lot for your help!
>>> Using Windows 7 or 8 to do the tests should be fine - I actually don't
>>> see any reason it shouldn't be OK to also do other builds and tests on them
>>> when not used for Firefox.
>>> The idea of testing on your infrastructure is tempting, but probably
>>> would consume too many resources, since this new testing system is intended
>>> to be used both overnight against the latest good compiler build, but also
>>> by every developer on its own while working on new features - this would be
>>> quite a lot of people. Doing a test build now would still be a good idea,
>>> though, at least to see if everything passes in the right environment.
>> Something Mozilla has been wanting to do for a while is stand up
>> "nightly" builds using the latest, unreleased toolchain from upstream. This
>> would help us identify compiler compatibility issues early. And it may even
>> help compiler toolchain developers find unwanted regressions.
>> In theory, Microsoft developers could gain access to our "Try" server,
>> which allows you to push changes made to Firefox in version control
>> (including its automation config) and run the results on our
>> infrastructure. We have a file in the repository referencing a zip file of
>> the Visual Studio toolchain to use. So, it would be possible for someone at
>> Microsoft to create a zip file of an unreleased VC++ toolchain and have our
>> automation run against that. Of course, there are scaling and privacy
>> issues to work out. (I'm guessing Microsoft LCA will have an opinion on
>> mostly-in-the-public-domain Mozilla infrastructure accessing pre-release
>> Microsoft software.)
>>> I have a few more questions about running the tests:
>>> 1. How exactly should the build artifacts be copied to the test machine?
>>> Something like ./mach package? After copying over, running the tests with
>>> ./mach is going to pick the binaries after copying, or is some ./mach
>>> "unpack" needed? I assume the entire mozilla-central enlistment is also
>> `mach package` will get you a standalone archive or installer suitable
>> for deploying to test machines.
>> For historical reasons, not all `mach *test` commands integrate really
>> well with pre-built archives/installers. There is a whole other layer that
>> deals with running tests in automation. That being said, some `mach`
>> commands do accept an argument with an alternate path to a binary to use.
>> e.g. `mach reftest --appname <path>` and `mach mochitest --appname <path>`.
>> Running tests from arbitrary binaries is definitely something we'd like to
>> make a bit easier. So if there's a feature you'd like, don't hesitate to
>>> 2. Can I see on Treeherder the exact command line that was used to
>>> launch the test suite? I looked over the log files and didn't find anything
>>> like that.
>> In some tests. e.g. in https://archive.mozilla.org/pu
>> tests1-windows-build17.txt.gz the main command is:
>> c:/mozilla-build/python27/python' '-u'
>> 'scripts/scripts/web_platform_tests.py' '--cfg'
>> 'web_platform_tests/prod_config_windows.py' '--test-type=testharness'
>> '--total-chunks' '5' '--this-chunk' '1' '--blob-upload-branch'
>> 'mozilla-central' '--download-symbols' 'ondemand'
>> This is invoking a tool called "mozharness" which is our "execute in
>> automation" layer. This tool performs tons of setup work, such as obtaining
>> a version control checkout, creating Python virtualenvs, etc. Once that's
>> all in place, it executes a test runner. e.g.
>> 08:22:59 INFO - Calling
>> ['c:\\slave\\test\\build\\venv\\Scripts\\python', '-u',
>> 'c:\\slave\\test\\build\\tests\\web-platform\\runtests.py', '--log-raw=-',
>> '--test-type=testharness', '--disable-e10s', '--total-chunks=5',
>> '--certutil-binary=c:\\slave\\test\\build\\tests/bin/certutil'] with
>> output_timeout 1000
>> There is a lot of complexity here. Most of it unique to our automation
>> infrastructure. I recommend you avoid mozharness completely and try to
>> execute things with "mach." (This is the direction we're trying to take
>> things.) Again, if you need features to make that easier, please ask and
>> we'll see what we can do.
>>> I'm going to try again with Update 3, I might have used instead the most
>>> recent build, which can indeed show some new errors, the frontend team do a
>>> lot of changes.
>>> dev-builds mailing list
dev-builds mailing list