Several leak failures have slipped passed continuous integration

Andrew Halberstadt Thu, 29 Dec 2016 08:38:00 -0800

Over the holidays, we noticed that leaks in mochitest and reftest werenot turning jobs orange, and that the test harnesses had been running inthat state for quite some time. During this time several leak relatedtest failures have landed, which can be tracked with this dependency tree:

https://bugzilla.mozilla.org/showdependencytree.cgi?id=1325148&hide_resolved=0

The issue causing jobs to remain green has been fixed, however the knownleak regressions had to be whitelisted to allow this fix to land. Sowhile future leak regressions will properly fail, the existing ones (inthe dependency tree) still need to be fixed. For mochitest, thewhitelist can be found here:

https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#2218

Other than that, leak checking is only disabled on linux crashtests.

Please take a quick look to see if there is a leak in a component forwhich you could help out. I will continue to help with triage andbisection for the remaining issues until they are all fixed. Also bigthanks to all the people who are currently working on a fix or havealready landed a fix.


Read on only if you are interested in the details.


_Why wasn't this caught earlier?
_

The short answer to this question is that we do not have adequatetesting of our CI.

_The problem happened at the intersection between mozharness and thetest harnesses. Basically a change in mozharness exposed a latent bug inthe test harnesses, and was able to land because it appeared as ifnothing went wrong. Catching errors like this is tricky because regularunit tests would not have detected it either. It requires integrationtests of the CI system as a whole (spanning test harnesses, mozharnessand buildbot/taskcluster).



_How will we prevent this in the future?_

Historically, integration testing our test harnesses has been a hardproblem. However with recent work in taskcluster, python tests and somerefactoring on the build frontend, I believe there is a path forwardthat will allow us to stand up this kind of test. I will commit some ofmy time to fix this and hope to have /something/ running that would havecaught this by the end of Q1.

I would also like to stand up a test harness designed to test commandline applications in CI, which would provide another avenue for writingtest harness unit and integration tests. Bug 1311991<https://bugzilla.mozilla.org/show_bug.cgi?id=1311991> will track this work.

It is important that developers are able to trust our tests, and whenbugs like this happen, that trust is eroded. For that I'd like toapologize, and express my hope that this will be the last time a majortest result bug like this happens again. At the very least, we need tohave the capability of adding a regression test when a bug like thishappens in the future.


Thanks for your help and understanding.
- Andrew
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Several leak failures have slipped passed continuous integration

Reply via email to