Welcome to the April and May edition of the Engineering Effectiveness
Newsletter! The Engineering Effectiveness org makes it easy to develop,
test and release Mozilla software at scale. See below for some highlights,
then read on for more detailed info!
Highlights

   -

   The Select Translations MVP
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1855907> has landed in
   Nightly, and the feature is scheduled to ride the trains to ship in Firefox
   128 <https://whattrainisitnow.com/release/?version=128>. This allows
   users to translate selected text via the context menu.
   -

   A bug in Identical Code Folding detection was fixed for Firefox Desktop
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1896118> and Android
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1899972> builds. This
   leads to a 20MB reduction on Firefox Desktop build size and a 2MB
   reduction on Android!
   -

   We published Linux ARM64 Nightlies
   
<https://blog.nightly.mozilla.org/2024/04/19/firefox-nightly-now-available-for-linux-on-arm64/>,
   which have seen a steady increase in DAU/MAU since launch. The deb package
   already represents 45% of ARM64 MAU.
   -

   Mozillians across many teams (both within EE and without)
successfully rotated
   the Certification Authority we use to sign Firefox plugins and addons
   
<https://docs.google.com/document/d/1c2lvs7vbkfnCQ0lNJAkM7GxTL0OKEl6Y_lD-JoZ3aaQ/edit>!
   This prevented a third “Armag-addon” (only this one would have been much
   worse).
   -

   We kicked off our first big parallel translations training run
   
<https://gregtatum.github.io/taskcluster-tools/training.html?taskGroupIds=UVnPuJpwTqiOzTMBJ5RjDw%2CVVbnYO-YTsu-rkRS4qMH5Q%2CMFEN6PmrR323tnrPN8y1qQ&mergeChunks=true&showAll=false&taskGroupNames=[%22Main+training+branch%22%2C%22Older+stability+group%22%2C%22sacrebleu+fix%22]&hidden=UgjZbCiEQ4-6Gwv5uF5bcA%2CLphXRhZoScGYSNAPKCiVfQ%2CBFgoyXuvRHe0kK4J9d-Lgw%2CXFbconJ5Tzq-hXAJyZdlcA%2CKJ2zRcmVQBWoNnhxGvTReg>!
   This follows a long effort to stabilize the incredibly complex pipeline
   such that it can run hundreds of training tasks in parallel.
   -

   We added support for running tests on try matching tags
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1898051> in the manifest.
   Now you can do ./mach try fuzzy –tag <tag> and only tests annotated with
   that tag will be selected (WPT and Reftest based suites are not yet
   supported).

Detailed Project UpdatesBugzilla and Bugbug

   -

   Benjamin Mah built a new ML model to classify Fenix bugs
   <https://github.com/mozilla/bugbug/pull/4173> into suitable components.
   This model will work in coordination with the general component model to
   improve bug classification.
   -

   Benjamin Mah implemented an enhancement for BugBot’s triage rotations
   feature to notify involved triage owners
   <https://github.com/mozilla/bugbot/pull/2389> when performing a rotation.
   -

   Benjamin Mah implemented an improvement for BugBot to automatically
   clear its needinfo requests <https://github.com/mozilla/bugbot/pull/2397>
   when closing variant expiration bugs.
   -

   A new component was created for Release Engineering's packaging
   
<https://bugzilla.mozilla.org/buglist.cgi?product=Release%20Engineering&component=Release%20Automation%3A%20Packaging&resolution=---&list_id=17060189>
   .

Build System and Mach Environment

   -

   Serge Guelton fixed a bug in Identical Code Folding (ICF) detection for
   Firefox desktop <https://bugzilla.mozilla.org/show_bug.cgi?id=1896118>
   and Android <https://bugzilla.mozilla.org/show_bug.cgi?id=1899972>
   builds. This leads to a 20MB reduction on Firefox desktop build size and a
   2MB reduction on Android!
   -

   Serge Guelton reduced the execution time of mach configure + mach export
   by ~25%, mostly through parallelisation of various operations
   
<https://docs.google.com/document/d/1bRjwxx8IZBrVZuk2qURz5PvUcie0FezJySZapgrucSs/>
   .

CI and Treeherder

   -

   Alphare implemented batch Taskcluster APIs
   
<https://github.com/taskcluster/taskcluster/commit/26600f1a106b49f1728183305277f6dc177ef957>
   and updated Taskgraph to use them
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1840830#c3>, resulting in
   a 20% performance improvement in Gecko Decision tasks
   -

   Ben Hearsum discovered and debugged
   <https://github.com/taskcluster/taskcluster/pull/6925> problems
   <https://github.com/taskcluster/taskcluster/pull/6977> with our GCP spot
   termination logic. Tasks can now upload artifacts even after being
   terminated, opening the door for long running tasks to “pick up where
   they left off
   <https://github.com/mozilla/firefox-translations-training/issues/270>”.
   -

   Ben Hearsum helped enable GCP resource usage collection on our workers
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1890834>, giving us much
   more detailed insight
   <https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent>
   into how workers are (or aren’t) being utilized.
   -

   Andrew Halberstadt upgraded Gecko to Taskgraph 7.x
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1868440> which
contains several
   incompatible changes
   
<https://taskcluster-taskgraph.readthedocs.io/en/latest/reference/migrations.html#x-7-x>,
   including a simpler and more intuitive /taskcluster layout.
   -

   Sebastian Hengst + contractors from Teklia migrated Treeherder’s
   database from MySQL to PostgreSQL. This unblocks further updates to modern
   versions of tooling like Django and better analytics of CI data.
   -

   Eva Bardou upgraded Treeherder’s frontend
   
<https://github.com/mozilla/treeherder/commit/e7dc8f12bdd55bf4de892784b519e914bdba812f>
   to Django 4.2.
   -

   Sebastian Hengst enabled local development of Treeherder with a remote
   PostgreSQL database instance behind Google Cloud SQL Proxy.
   -

   Joel Maher added support for running tests on try matching tags
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1898051> in the manifest.
   Now you can do ./mach try fuzzy –tag <tag> and only tests annotated with
   that tag will be selected. This does not yet work for web-platform-tests
   <https://phabricator.services.mozilla.com/D211485>, reftest or crashtest.
   -

   Joel Maher fixed a couple of Reftest issues, ensuring that Linux tests
   have a valid theme <https://bugzilla.mozilla.org/show_bug.cgi?id=1895092>,
   and Windows GPUs have the correct device driver
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1899536>. These fixes
   should help reduce intermittents.

Crash Management

   -

   Suhaib Mujahid and Marco Castelluccio published a paper
   <https://arxiv.org/pdf/2401.13667> titled “Predicting the Impact of
   Crashes Across Release Channels” at the MSR conference
   <https://conf.researchr.org/home/msr-2024>. The paper was published in
   collaboration with Diego Elias Costa from Concordia University
   -

   Many issues were fixed in the new crash reporter client, including:
   improved localization, better Thunderbird support and superior backwards
   compatibility with the old client.
   -

   Gabriele Svelto ensured crash reports intercepted by the Windows Error
   Reporting runtime exception module now always contain an install time
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1893406>.
   -

   The Linux symbol scrapers have been expanded to cover more packages and
   not fail when presented with huge amounts of debug information in a single
   pass.
   -

   Crash Pings submitted over Glean on desktop now contain the full
   telemetry environment and the crash stack.

Lint, Static Analysis and Code Coverage

   -

   Marco Castelluccio, Christian Holler and Jason Kratzer published a study
   <https://azaidman.github.io/publications/brandtICSE2024.pdf> about code
   coverage gaps and automatic generation of tests, titled “Mind the Gap: What
   Working With Developers on Fuzz Tests Taught Us About Coverage Gaps”. This
   study was published at the ICSE conference
   <https://conf.researchr.org/home/icse-2024> in collaboration with
   Carolin Brandt and Andy Zaidman from Delft University of Technology and
   with Alberto Bacchelli from the University of Zurich.

OS Integration and Security

   -

   QA has begun testing our integration of the DLP (data loss prevention
   <https://en.wikipedia.org/wiki/Data_loss_prevention_software>) SDK
   <https://github.com/chromium/content_analysis_sdk> support in Nightly.
   This is an enterprise feature allowing data loss prevention vendors such as
   Broadcom and Trellix to integrate with Firefox in a more reliable and
   stable manner.

PDF.js

   -

   Calixte replaced the jpeg2000 decoder with the OpenJPEG one using WASM.
   This fixes various rendering issues and improves the overall performance of
   jpeg2000 decoding.
   -

   Nicolò Ribaudo implemented fixes for text selection flickering
   <https://github.com/mozilla/pdf.js/pull/17923> on touch screen devices.
   -

   Aditi fixed a discrepancy <https://github.com/mozilla/pdf.js/pull/17770>
   between the lang tag of the PDF viewer and of the canvas, which led to
   misaligned text selection.
   -

   We have started experimenting with alt text generation
   
<https://hacks.mozilla.org/2024/05/experimenting-with-local-alt-text-generation-in-firefox-nightly/>
   using local AI models in the feature to add images within PDFs.

Firefox Translations

   -

   We kicked off our first big parallel training run! This follows a long
   effort to stabilize the incredibly complex pipeline such that it can run
   hundreds of training tasks in parallel.
   -

   Greg Tatum created a dashboard
   
<https://gregtatum.github.io/taskcluster-tools/training.html?taskGroupIds=UVnPuJpwTqiOzTMBJ5RjDw%2CVVbnYO-YTsu-rkRS4qMH5Q%2CMFEN6PmrR323tnrPN8y1qQ&mergeChunks=true&showAll=false&taskGroupNames=[%22Main+training+branch%22%2C%22Older+stability+group%22%2C%22sacrebleu+fix%22]&hidden=UgjZbCiEQ4-6Gwv5uF5bcA%2CLphXRhZoScGYSNAPKCiVfQ%2CBFgoyXuvRHe0kK4J9d-Lgw%2CXFbconJ5Tzq-hXAJyZdlcA%2CKJ2zRcmVQBWoNnhxGvTReg>
   that shows the current training run’s progress. Updates are also manually
   tracked in this spreadsheet
   
<https://docs.google.com/spreadsheets/d/1iyKKWVYXV68sAKwzU9Y6f6KNRnvQQ7HJ7431TSfr5BU/edit#gid=639593490>
   (which also contains a link to the most recent dashboard).
   -

   We will train the first half of the model pipeline (up until a single
   teacher training) and look at the initial evaluation results. If the models
   are good enough to continue, we'll trigger the rest of the training to go
   until the final production ready models.
   -

   The first wave will be the models going into English, because there is a
   lot of English monolingual data available. After the first wave, we'll
   continue with a second wave going from English. We can bootstrap this
   second wave with our xx-en models we trained in the first wave.
   -

   It's about 3-4 weeks for a full training run for a single language
   direction. The first stage we're stopping at is about 1 week of training.
   This is all dependent on data size, and it will be variable.
   -

   Evgeny Pavlov has been leading up a big part of the work on developing
   our training recipe, and coordinating with Teklia contractors to get our
   experiment tracking integration with Weights and Biases set up.
   -

   Ben Hearsum has done significant work to ensure that we can train new
   language pairs on preemptible GCP instances, which will greatly lower the
   financial cost of training them.
   -

   Erik Nordin has nearly completed the implementation of the Select
   Translations MVP <https://bugzilla.mozilla.org/show_bug.cgi?id=1855907>,
   and the feature is scheduled to ride the trains to ship in Firefox 128
   <https://whattrainisitnow.com/release/?version=128>.
   <https://gregtatum.github.io/are-we-translations-yet/>

Phabricator , moz-phab, and Lando

   -

   Connor Sheehan completed a migration of the Treestatus tool from a
   standalone service owned by RelEng into a feature of Lando. The new Lando
   Treestatus has a proper test suite and the UI is implemented in
   technologies familiar to our engineering teams.
   -

   Connor Sheehan implemented several of the hook checks
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1863629> on hg.mozilla.org
   as checks within Lando, which is required for the hg->git migration.
   -

   Connor Sheehan added support for the cypress project branch to
   Lando/Phabricator.

Release Engineering and Management

   -

   Release Management shipped two new Firefox releases and a number of
   follow-up dot releases to address quality issues found post-release.
   -

   Gabriel Bustamante published a Linux ARM64 Nightly
   
<https://blog.nightly.mozilla.org/2024/04/19/firefox-nightly-now-available-for-linux-on-arm64/>.
   Since then, DAU and MAU have linearly increased
   
<https://blog.nightly.mozilla.org/2024/04/19/firefox-nightly-now-available-for-linux-on-arm64/>.
   No sign of decrease in sight. Interesting fact: the .deb package
   represents 45%
   
<https://blog.nightly.mozilla.org/2024/04/19/firefox-nightly-now-available-for-linux-on-arm64/>
   of ARM64 MAU.
   -

   Julien Cristau, Ben Hearsum, and members of the Desktop Integration
team prevented
   Windows users from not being able to update or reinstall their Firefox
   
<https://docs.google.com/document/d/1e1D9YfryZaomIIzamWi-AAv-CltonS0pL02VGNx62kw/edit>.
   This was caused by a certificate expiring in mid-June and the new
   certificate had new constraints.
   -

   Ben Hearsum and several Mozillians from many teams successfully rotated
   the Certification Authority we use to sign Firefox plugins and addons
   
<https://docs.google.com/document/d/1c2lvs7vbkfnCQ0lNJAkM7GxTL0OKEl6Y_lD-JoZ3aaQ/edit>.
   This prevented another “Armag-addon” like the one that occurred in May 2019.
   -

   Heitor Neiva optimized how macOS builds are notarized
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1889223>. The average
   tasks duration went from ~30 minutes down to ~12 minutes
   <https://sql.telemetry.mozilla.org/queries/99724#245894>. Overall, this
   represented 860+ hours of compute in April and now we’re down to ~450h
   <https://sql.telemetry.mozilla.org/queries/99741>.
   -

   Andrew Halberstadt converted Firefox iOS to use the new Bitrise
   scriptworker <https://github.com/mozilla-mobile/firefox-ios/pull/19604>.
   This allows Firefox iOS to securely trigger Bitrise workflows from
   Taskcluster, allowing these workflows to plug into standard Taskcluster
   release pipelines.
   -

   Geoff Brown, Julien Cristau, Johan Lorenzo and members of the Android
   teams wrapped up the Android repository migration
   
<https://docs.google.com/document/d/1W7lm5yCI7miRpPtgNhsvGpgp9oT17xOuW-Vjh5gZS3M/edit>.
   Now all Fenix and Focus releases happen on hg.mozilla.org.
   -

   Julien Cristau updated beetmover to stop archiving test packages
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1827513>, saving storage
   costs.
   -

   Sylvestre audited the artifacts stored on archive.m.o to remove a lot of
   old and unused files.
   -

   Ben Hearsum has been working on migrating l10n strings to GitHub
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1877097> with the l10n
   team. We expect to cut over to the new repository
   <https://github.com/mozilla-l10n/firefox-l10n> in early June.
   -

   Ryan VanderMeulen has been working with Mike Kaply and Release
   Engineering to mitigate slow Google Play review times impacting our ability
   to ship timely Android releases.
   -

   Donal Meehan created a Release Delay Runbook
   
<https://mozilla-hub.atlassian.net/wiki/spaces/RELMAN/pages/707297310/Release+Delay+Runbook>
   documenting the steps that need to be taken in the event of needing to
   delay a release in response to a recent incident.
   -

   Multiple tooling improvements for the release management team (on
   https://whattrainisitnow.com/, https://trainqueries.herokuapp.com/ ,
   https://bugimpact.herokuapp.com)

Version Control

   -

   Van Le, Greg Cox and Connor Sheehan worked to increase the amount of RAM
   on hg.mozilla.org, eliminating many OOM issues and making the service
   more stable.
   -

   Connor Sheehan added
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1892039> a pushchangedfiles
   endpoint to hg.mozilla.org, which is a minimal and more performant
   version of the json-automationrelevance endpoint used by various tasks
   in CI, and Andrew Halberstadt updated CI to use it
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1891768>.

Other

   -

   Sylvestre upgraded Sphinx to 7.2.6 and all other dependencies for
   https://firefox-source-docs.mozilla.org.


Thanks for reading and see you next time!

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/CAAJAz%2B7E_SvoAtgp-3otZwjoV_HVuLSFpiV32g_2qCO45yz5vw%40mail.gmail.com.

Reply via email to