Hi everyone,

It's been 10 weeks since I have started writing these newsletters (the
number in the title isn't an off by one error, there was a one week hiatus
due to a work week!).  We still have quite a bit of work ahead of us, but
we have also accomplished a good amount.  Finding a good metric for
progress is hard, but we live and breathe in Bugzilla, so we use a bug-based
burn-down chart
<https://people-mozilla.org/%7Ecpeterson/bb/burndown.html?whiteboard=[qf:p1]&Since=2017-01-01>.
As you can see, we are starting to see a decrease in the number of open
bugs, and this is as we are actively adding tens of new bugs to the pool in
the weekly triage meetings.

The other thing that this burn-down chart shows is that we need help!  Very
recently Kan-Ru came up with the great idea of creating the
qf-bugs-upforgrabs
<https://bugzilla.mozilla.org/show_bug.cgi?id=qf-bugs-upforgrabs> tracker
bug.  These are reasonably self-contained bugs that require less specific
domain knowledge and can be worked on by anyone in a reasonable time
frame.  Please consider taking a look at the dependency list of that bug to
see if something interests you! (The similarity of this tacker bug to
photon-perf-upforgrabs
<https://bugzilla.mozilla.org/show_bug.cgi?id=photon-perf-upforgrabs> isn't
an accident!)

On the telemetry hang reports data collection, the new data from hangs of
128ms or longer have been coming in, but there have been some wrinkles in
actually receiving this data, and also in receiving the hang data
correlated to user interactivity
<https://bugzilla.mozilla.org/show_bug.cgi?id=1353440>.  Michael Layzell
has been tirelessly at work on the BHR backend to make it suit our needs,
and has been discovering the edges of computation limits in order to
symbolicate the BHR reports on people.mozilla.org (now moved to AWS
<https://s3-us-west-2.amazonaws.com/bhr-data/index.html>!).

I realized we haven't had a performance mini-story for a while -- I sort of
dropped the ball on that.  Running over this bug
<https://bugzilla.mozilla.org/show_bug.cgi?id=1362886> made me want to talk
about a pretty well known sort of slowness in C++ code, virtual functions.
The cost of virtual functions comes from several different aspects, firstly
they effectively prevent the compiler from doing inlining the function
which enables a host of compiler optimizations, essentially by enabling the
compiler to see more of the code and optimize more effectively based on
that.  But then there is the runtime cost of the function, which mostly
comes from the indirect call.  The majority of the performance penalty here
on modern hardware is due to branch midpredictions when different
implementations of a virtual function get called at a call site.  You
should remember that on modern <http://www.7-cpu.com/cpu/Haswell.html>
desktop <http://www.7-cpu.com/cpu/SandyBridge.html> processors
<http://www.7-cpu.com/cpu/Skylake.html>, the cost of a branch misprediction
can be around 15-20 cycles (depending on the processor) so if what your
function does is very trivial and it has many overrides that can be called
in hot code chances are that you are spending a considerable amount of time
waiting for the instruction cache misses on the calls to the virtual
function in question.  Of course, finding which virtual functions in your
program are these expensive ones requires profiling the workloads you care
about improving, but always keep an eye for this problem as unfortunately
the object-oriented programming model in C++ really encourages writing code
like this.  This is the kind of issue that a native profiler is probably
more suitable for discovering, for example if you are using a simple native
sampling profiler these issues typically show up as a long amount of time
being spent on the first instruction of the virtual function being called
(which is typically an inexpensive instruction otherwise.)

Now it's time to acknowledge the work of all of you who have helped in
improving the performance of the browser in the last week.  As always, I
hope I'm not forgetting anyone:


   -

   Doug Thayer ported the Gecko Profiler add-on to be a WebExtension
   <https://github.com/devtools-html/Gecko-Profiler-Addon/pull/44>!  One
   important impact of this work is that this makes it possible to profile
   Firefox using this add-on without incurring the performance impact of
   having an extension using the add-on SDK installed.
   -

   Kris Maglione added support for pre-loading scripts during startup on a
   background thread <https://bugzilla.mozilla.org/show_bug.cgi?id=1359653>.
   This helps improve startup performance for the parent process.
   -

   David Anderson made us composite asynchronously on Windows when resizing
   a widget <https://bugzilla.mozilla.org/show_bug.cgi?id=1361257>.  This
   can reduce main thread jank for example when opening a window.  He also made
   PLayerTransaction’s constructor async
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1350634> removing a
   synchronous IPC message that we used to incur when opening a new window.
   -

   David Baron ensured that PLDHashTable’s second hash doesn't have padding
   with 0 bits for tables with capacity larger than 2^16
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1352889>.  This hopefully
   reduces the risk of encountering long chains in large hash tables, which
   could improve some of the hash table performance issues we have noticed
   come up in profiles.
   -

   Cameron McCormack made dom::FontFace cache its gfxCharacterMap instead
   of rebuilding it every time
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1352531>.
   -

   William Chen made us reuse StackNodes in HTML parser TreeBuilder in
   order to avoid malloc overhead
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1355441>.
   -

   Gabor Krizsanits enabled preallocating content processes by default
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1341008>, which should
   give us perceived performance wins on new tab and window opens.
   -

   Nathan Froyd made it possible to profile Stylo Rayon threads using the
   Gecko profiler <https://bugzilla.mozilla.org/show_bug.cgi?id=1322656>.
   -

   Bas Schouten moved pointers to DisplayDataItems directly on nsIFrame
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1331718>.  This will allow
   more efficient access to them by avpiding a lot of hashtable lookups, and
   providing better data locality.
   -

   Michael Layzell made us avoid checking for permissions that almost never
   exist unless they do exist to bypass the overhead of nsContentBlocker
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1363243>.
   -

   William Chen flattened attribute storage in the HTML parser
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1355479> in order to avoid
   the cost of dynamic memory allocation.
   -

   Thinker Li added a shortcut to nsFrame::BuildDisplayListForChild()
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1342009> in order to
   improve display list construction speeds by remembering the results of the
   previous rounds of computation.
   -

   Tim Taubert imposed a 2KB limit on the amount of session storage data
   preserved by session restore
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1362058>.
   -

   Jan de Mooij optimized Array.prototype.shit to have O(1) rather than
   O(n) behavior <https://bugzilla.mozilla.org/show_bug.cgi?id=1348772>.
   This is especially nice considering JS libraries using arrays as queues
   which tend to call shift() inside a loop which would cause us have
   quadratic behavior before this change.


Until next week, happy hacking!

Cheers,
-- 
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to