Hi everyone,

A while ago a number of engineers including myself started to look into a
performance project that turned into Quantum Flow.  The focus of the
project is finding and prioritising the issues across the browser so we
will need help from many of you to get them fixed.  I’m planning to write
regular updates about the project and highlight the focus areas and the
ongoing work.  In this first email I’m going to start by giving some
background about how we started and where we are now.

Quantum Flow is a performance task force focusing on eliminating
performance cliffs in the browser that aren’t part of other Quantum
projects.   Project Quantum’s overall focus is to deliver a high
performance browser engine, and we are making some great progress on the
four main sub-projects that are attacking large portions of the rendering
pipeline, but that leaves us with various performance issues elsewhere in
the browser which users may still hit, and we have to fix all such issues
to ensure that the ultimate result is a next generation browser (and
browser engine) we all can be proud of.

A good way to think about how Quantum Flow fits with the rest of Quantum
projects is to imagine it as the foundation we need for the other projects
to build up on.  For example, if a bad bug somewhere in the browser causes
a jank in some code for a few hundreds of milliseconds, all of the benefit
that we obtain from cooperative scheduling of JS on the page with Quantum
DOM, resolving the styles in parallel on all of your CPU cores with Quantum
CSS and rasterize the page directly on the GPU with Quantum Render will
still result in a janky experience[0].  So we want to ensure that we remove
these types of roadblocks that would prevent the rest of Quantum to shine.

The above description may feel a little big vague, and a little bit too
broad, so let me try to explain how the need for Quantum Flow became
apparent.  Around the beginning of this year a number of us gathered for a
work week in Taipei with the goal of measuring and improving the
performance of Firefox on a few large websites that we knew we had
performance problems on.  Initially we were only focusing on Google
Suite[1], and we started by profiling some of the test cases run by the
Hasal framework[2].

We had a bit of a difficult time finding actionable issues that we could
improve since these websites are massive and it can be extremely difficult
to find out why the overall time of some particular interaction is
different when comparing different browsers head to head.  Also, we started
seeing some performance issues on those websites that were coming from
parts of the browser that were a bit surprising.  For example, Chris Pearce
found out that on Google Docs, the content process can be blocked on the
parent process for a synchronous IPC message to initialize spell
checking[3] even though Google Docs doesn’t use the browser’s spell
checking facilities!

Following the breadcrumbs, we started to wonder what else we can learn
about if we profile more usage scenarios in the browser.  As you may
expect, we have found a fair amount of performance issues in various parts
of the browser.  That’s hardly a surprise given the size and complexity of
the code base, but we have also learned a lot about the adverse impact of
some of these issues at play here.  These findings have uncovered larger
problem areas that we decided we need to address as part of an initiative
that we call Quantum Flow.

I’m planning to focus on one important class of performance issues in this
first email, mostly because it’s probably the most prevalent of the issues
we have been looking at so far: synchronous IPC messages from the content
process to the parent process.  We currently have a high number[4] of these
types of messages.  But of course not every one of these messages is equal,
we have gathered telemetry[5] on them.  We have a tracker bug[6] to track
fixing them all.

Some people here may remember the impact of synchronous I/O on the
performance of Firefox a few years ago, or you may have had to deal with
such performance issues in other applications.  Based on my experience
measuring synchronous IPC, I now sometimes miss synchronous I/O.  :-)  I
have seen synchronous IPC calls that take amount to *seconds* of pause time
on the content process’s main thread.  To some extent, with e10s, we hide
some of the pauses that happen on the content process.  For example, APZ[7]
allows you to scroll even when the content process main thread is busy and
we can force-paint on tab switch when the content process is busy running
JS[8], but eventually some user out there is going to want to interact with
the page, and that’s when the input events are going to be handled with a
noticeable lag, and the browser is going to behave sluggish.

To give a couple of examples of the really painful performance penalty that
we are paying as a result of these synchronous IPC messages, consider the
document.cookie API[9] and the window.screen API[10].  Both of these are
pretty old APIs in the Web platform which are used by millions of web
pages, and we implement them by pausing the content process’s main thread
and sending a request to the parent process before we return to the JS code
running on the page.  This means that a loop somewhere on a page that
accesses document.cookie, for example, can potentially run for several
seconds, even as the page is sitting in the background.  In one
exceptionally bad scenario that I personally hit on Nightly, an ad iframe
in a background page was querying the cookies with a high frequency and the
work load coming from just that one page was effectively making one of my
two content processes unusable as the main thread was almost always busy
waiting on the parent process.

I’ll stop here, but I’m planning to send these updates regularly, about
once a week or so.  In each newsletter, I’ll tell a short story about a
performance aspect of the browser that we have been looking at as part of
the Quantum Flow project, will talk a little about the current focus areas,
and will also include a short section to appreciate the help of all of the
engineers who contributed to the Quantum Flow project in the week since the
last newsletter.  The performance story of the next week’s issue will be
about page navigations.

Please let me know if you have any ideas for making the format more useful
(preferably off-list).  Making a fast browser is a really important goal
for us this year, and I hope you find this newsletter informative and
helpful for that goal.  We can always use your help in this project.
Please get in touch with us on #flow on IRC and Bugzilla[11] if you’re
wondering how you can help!

Cheers,
Ehsan

[0] Of course, I’m only focusing on performance here.  Quantum Compositor’s
benefits are obviously orthogonal to other performance pitfalls.

[1] That is, Google Docs, Google Sheets and Google Slides.

[2] https://github.com/Mozilla-TWQA/Hasal

[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1330912

[4] The full list is available here:
https://hg.mozilla.org/mozilla-central/file/tip/ipc/ipdl/sync-messages.ini

[5] The current probe is called IPC_SYNC_LATENCY_MS, but it may be soon
renamed in bug 1337073.

[6] https://bugzilla.mozilla.org/show_bug.cgi?id=SyncIPC

[7] Asynchronous panning and zooming, which basically means not blocking on
the content process when scrolling, and “checkerboard” (or show a blank
area temporarily) if the content process can’t paint quickly enough.

[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1279086

[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1331680

[10] https://bugzilla.mozilla.org/show_bug.cgi?id=1194751

[11] https://bugzilla.mozilla.org/show_bug.cgi?id=QuantumFlow
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to