Hi everyone,

Another week full of performance related updates quickly went by, I'd like
to share a few of them.

We're almost mid-April, about 3 weeks after I shared my first update
<https://ehsanakhgari.org/blog/2017-03-23/quantum-flow-engineering-newsletter-3>
on our progress battling our sync IPC issues.  I have prepared a second Sync
IPC Report for 2017-04-13
<https://docs.google.com/spreadsheets/d/1x_BWVlnQPg0DHbsrvPFX7g89lnFGa3lAIHWD_pLa_dE/edit#gid=997870905&fvid=1346771801>.
For those who looked at the previous report, this is in the same
spreadsheet, and the data is next to the previous report, for easy
comparison.  We have made a lot of great progress fixing some of the really
bad synchronous IPC issues in the recent few weeks, and even though
telemetry data is laggy, we are starting to see this reflect in the data
coming in through telemetry!  Here is a human readable summary of where we
are now:

   - PCookieService::Msg_GetCookieString
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1331680> is still at the
   top of the list, now taking a whopping 45% piece of the pie chart!  I don't
   think there is any reason to believe that this has gotten particularly
   worse, it's just that we're starting to get better at not doing synchronous
   IPC, so this is standing out even more now!  But its days are numbered.  :-)
   - PContent::Msg_RpcMessage and PBrowser::Msg_RpcMessage at 19%.  We
   still need to get better data about the sync IPC triggered from JS, that
   shows up in this data under one of these buckets.
   - PJavaScript::Msg_Get at 5% (CPOW overhead) could be caused by add-ons
   that aren't e10s compatible.
   - PAPZCTreeManager::Msg_ReceiveMouseInputEvent.  This one (and a few
   other smaller APZ related ones) tends to have really low mean values, but
   super high count values which is why they tend to show high on this list,
   but they aren't necessarily too terrible compared to the rest of our sync
   IPC issues.
   - PVRManager::Msg_GetSensorState
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1346927> also relatively
   low mean values but could be slightly worse.
   - PJavaScript::Msg_CallOrConstruct, more CPOW overhead.
   - PContent::Msg_SyncMessage, more JS triggered sync IPC.

A few items further down on the list are either being worked on or recently
fixed as well.  I expect this to keep improving over the next few weeks.
It is really great to see this progress, thanks to everyone who has worked
on fixing these issues, helping with the diagnoses, code reviews, etc.

We have also been working hard at triaging performance related bug
reports.  In order to keep an eye over the bug-to-bug status of project you
can use the Bugzilla queries on the wiki
<https://wiki.mozilla.org/Quantum/Flow#Bugzilla_Query_Lists>.  As of this
moment, we have triaged 160 bugs as [qf:p1] (which means, these performance
related bugs are the ones we believe should be fixed now for the Firefox 57
release).  Of these bugs, 92 bugs are unassigned right now.  If you see a
bug on this list in your area of expertise which you think you can help
with, please consider picking it up.  We really appreciate your help.
Please remember that not every bug on this list is complicated to fix, and
there's everything from major architectural changes to simple one-liner
fixes up for grabs.  :-)

Another really nice effort that is starting to unfold and I'm super excited
about is the new Photon performance project
<https://bugzilla.mozilla.org/show_bug.cgi?id=1348289>, which is a focused
effort on the front-end performance.  This includes everything from
engineering the new UI with things like animations running on the
compositor in mind from the get-go, being laser focused on guaranteeing
good performance on key UI interactions such as tab opening and closing,
and lots of focused measurements and fixes to the browser front-end.

The performance story of this week is about how measurement tools can
distort our vision.  And this one isn't much of a story, it's more of a
lesson that I have been learning seemingly over and over again, these
days.  You may have heard of the measurement problem
<https://en.wikipedia.org/wiki/Measurement_problem>, which basically
amounts to the fact that you always change what you measure.  Markus and I
were recently talking about the cost of style flushes for browser.xul that
I had seen in my profiles and how they could sometimes be expensive, and
noticed that this may be due to the profiler overhead
<https://bugzilla.mozilla.org/show_bug.cgi?id=1354255> that we incur in
order to show information about the cause of the restyle in the profile
UI.  He fixed the issue since.  I think the reason why I didn't catch this
in my own profiling was that I have gotten so used to seeing expensive
reflows and restyles that sometimes I accept that as a fact of life and
don't look under the hood closely enough.  Lesson learned!

We have a bug <https://bugzilla.mozilla.org/show_bug.cgi?id=1329212>
tracking these types of issues, so if you know of something similar please
create a dependency.  If you also profile Firefox regularly using the Gecko
Profiler, adding yourself to the CC list of that bug may not be a bad idea.


Now it's time to acknowledge those who have helped make Firefox faster in
the past week.  I will probably forget a few people here, apologies for any
unintended omissions!


   - Tim Taubert made a couple
   <https://bugzilla.mozilla.org/show_bug.cgi?id=912717> of fixes
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1353533> to the
   performance of SessionCookies.jsm code, which, IINM runs periodically on
   the UI thread and during startup.  Well done!
   - Henry Chang continued
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1343425> his work on
   improving url-classifier performance.
   - David Keeler disabled an expensive telemetry probe
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1353216> which could slow
   down HTTPS page loads by up to 2 times in certain cases.
   - Jonathan Kew added some caching to
   gfxFontShaper::GetRoundOffsetsToPixels()
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1352528>.
   - Olli Pettay enabled high priority vsync events in the parent process
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1352523>.  This seems to
   have finally stuck.  Olli was trying to get this landed for months now, and
   our unit tests disagreed.  :-)
   - Kyle Machulis restricted the plugin finding/initialization code to
   Flash/PDF <https://bugzilla.mozilla.org/show_bug.cgi?id=1351490>,
   lowering the overhead of the expensive MIME type lookups we used to do.
   - JW Wang fixed a bug
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1354389> where we were
   blocking the main thread for file I/O done on a background thread for up to
   8 seconds according to telemetry data.  Readers of this newsletters should
   be familiar with this anti-pattern now...
   - Gerald Squelart fixed a
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1337063> graphics
   initialization synchronous IPC issue.  This could be a navigation
   performance issue with multi-e10s.
   - Evelyn Hung lowered the cost of size calculation
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1355595> for the height of
   the squiggly line we draw for misspelled words during spell checking.
   - Marco Bonardo removed a sync reflow
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1353708> which caused jank
   when opening the awesomebar panel.
   - Dão Gottwald also removed a sync reflow loop
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1354782> which caused jank
   when resizing windows with pinned tabs.
   - Masatoshi Kimura added a cache
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1353493> for the
   preference access in nsIWidget::DefaultScaleOverride().
   - Jan de Mooij further improved the external string cache
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1353758> by moving it into
   the JS engine.  He also improved our SetProp failure rate
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1336580> on sites like
   Google Spreadsheets.
   - Kartikaya Gupta removed a synchronous IPC
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1350638> from the
   compositor.
   - Makoto Kato avoided using synchronous IPC
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1330912> to set the
   spellchecker dictionary language.
   - Neil Deakin removed a sync reflow
   <https://bugzilla.mozilla.org/show_bug.cgi?id=1334635> that happened
   when closing a window.
   - Mike Conley started to create an add-on
   <https://mikeconley.github.io/ohnoreflow/> helping front-end engineers
   in finding sync reflow issues in the browser UI.


Until next week, happy hacking!

-- 
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to