Hi all,
There have been a couple of threads related to test automation in B2G,
asking why we haven't caught some especially egregious regressions; the
kind that basically "break the phone".
To answer that, I'd like to describe how our on-device automation
currently works, and what we're doing to expand it so we can more
effectively address these concerns.
We currently have a smallish number of real devices, managed by WebQA,
hooked up to on-device automation. They run a bank of tests against a
number of branches several times a day. The devices are time-consuming
to manage, since they occasionally get wedged during flashing,
rebooting, or other operations, and require manual intervention to fix.
For this reason, and because it's been very difficult to obtain
significant numbers of devices for automation, we haven't been able to
run any tests frequently enough to provide per-commit coverage.
When tests fail, WebQA engages in a fairly time-consuming process of
investigation and bisection. In the case of the homescreen breakage
(caused by https://bugzilla.mozilla.org/show_bug.cgi?id=957086), our
on-device tests did break, and the team was in the process of
investigating these failures, which has to be done in order to be able
to create specific, actionable bugs.
Clearly, what we really want is to be able to run at least a small set
of tests per-commit, so that when things break, we don't need to spend
lots of time investigating...we will already know which commit caused
the problem, and can back it out or address it otherwise promptly.
That's exactly what we are planning for Q3, thanks to the Flame device.
Jonathan Hylands has developed a power harness for this that allows us
to remotely restart the phone, which addresses some of the device
management concerns. The A*Team, WebQA, and jhylands are working
together to get 30 Flames in automation, and to reduce their management
costs. This is enough to allow us to run a small set of functional and
performance tests per-commit, which should be enough to catch most
"break the phone" problems.
Another issue we've had with device testing is test result visibility;
currently, test results are available on Jenkins, for which you need VPN
access. This is awkward for people not closely involved in maintaining
and running the tests.
Next quarter, we will be improving this as well. Jonathan Eads on the
A*Team is currently in the process of deploying Treeherder, a successor
to TBPL. Unlike TBPL, Treeherder is not tightly coupled with buildbot,
and is capable of displaying test results from arbitrary data sources.
As our bank of 30 Flames becomes available, we will start publishing
on-device test results to Treeherder, in the same UI that will be used
to display the per-commit tests being run in buildbot. This will give
people a "one-stop shop" for seeing test results for B2G, regardless of
whether they're run on devices or in VM's managed by buildbot.
Both of these pieces together will give us the ability to manage some
on-device tests in a manner similar to the way we currently handle
desktop and emulator tests in TBPL; especially bad commits should break
tests, the breakage should be visible in Treeherder, and the sheriffs
will back out the offending commits.
We won't have enough device capacity to run all device tests per-commit,
at least at first. We'll have to carefully select a small set of tests
that guard against the worst kinds of breakage. Whether we can scale
beyond 30 devices will depend on how stable the devices are and what
their management costs are, which is something we'll be looking at over
the next few months.
Regards,
Jonathan
_______________________________________________
dev-b2g mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-b2g