Also note this has happened before. mccr8 was looking into similar leak-checking-is-totally-busted-but-nobody-noticed issues a few years ago in https://bugzilla.mozilla.org/show_bug.cgi?id=1045316
Glad to hear you're looking into end-to-end tests! -e On Thu, Dec 29, 2016 at 8:37 AM, Andrew Halberstadt < [email protected]> wrote: > Over the holidays, we noticed that leaks in mochitest and reftest were not > turning jobs orange, and that the test harnesses had been running in that > state for quite some time. During this time several leak related test > failures have landed, which can be tracked with this dependency tree: > https://bugzilla.mozilla.org/showdependencytree.cgi?id= > 1325148&hide_resolved=0 > > The issue causing jobs to remain green has been fixed, however the known > leak regressions had to be whitelisted to allow this fix to land. So while > future leak regressions will properly fail, the existing ones (in the > dependency tree) still need to be fixed. For mochitest, the whitelist can > be found here: > https://dxr.mozilla.org/mozilla-central/source/ > testing/mochitest/runtests.py#2218 > > Other than that, leak checking is only disabled on linux crashtests. > > Please take a quick look to see if there is a leak in a component for > which you could help out. I will continue to help with triage and bisection > for the remaining issues until they are all fixed. Also big thanks to all > the people who are currently working on a fix or have already landed a fix. > > Read on only if you are interested in the details. > > > > *Why wasn't this caught earlier? * > The short answer to this question is that we do not have adequate testing > of our CI. > > The problem happened at the intersection between mozharness and the test > harnesses. Basically a change in mozharness exposed a latent bug in the > test harnesses, and was able to land because it appeared as if nothing went > wrong. Catching errors like this is tricky because regular unit tests would > not have detected it either. It requires integration tests of the CI system > as a whole (spanning test harnesses, mozharness and buildbot/taskcluster). > > > *How will we prevent this in the future?* > > Historically, integration testing our test harnesses has been a hard > problem. However with recent work in taskcluster, python tests and some > refactoring on the build frontend, I believe there is a path forward that > will allow us to stand up this kind of test. I will commit some of my time > to fix this and hope to have *something* running that would have caught > this by the end of Q1. > > I would also like to stand up a test harness designed to test command line > applications in CI, which would provide another avenue for writing test > harness unit and integration tests. Bug 1311991 > <https://bugzilla.mozilla.org/show_bug.cgi?id=1311991> will track this > work. > > It is important that developers are able to trust our tests, and when bugs > like this happen, that trust is eroded. For that I'd like to apologize, and > express my hope that this will be the last time a major test result bug > like this happens again. At the very least, we need to have the capability > of adding a regression test when a bug like this happens in the future. > > Thanks for your help and understanding. > - Andrew > > _______________________________________________ > firefox-dev mailing list > [email protected] > https://mail.mozilla.org/listinfo/firefox-dev > > _______________________________________________ dev-platform mailing list [email protected] https://lists.mozilla.org/listinfo/dev-platform

