Hello #jsapi.

I wanted to report on some of the JS-centric results of the Google Suites work week in Taipei, and hopefully get more eyes on these problems.

First, some background: Sean and I met up with about two dozen other people involved in the Google Suites performance improvement effort. We had the Hasal people from Taipei, a bunch of project manager type folks, and a scattering of developers from across Gecko. The GSuites effort is considered extremely high priority, for good reasons that I won't go into here. In Taipei, we looked at lots of profiles, worked on internal tools, argued over what to test and how to measure, and ate stinky tofu.

High level findings:

- we have a cluster of test cases that run around 20% slower than Chrome, with some quite a bit worse than that (think 40-60%, with outliers at 2x)

 - most of the existing test cases appear to be JS bound

 - some other areas showed up here and there, eg layout and GC

 - most of our current test cases boil down to document load times

 - we don't have much visibility into what actual users are hitting

- we still don't have a great way to say exactly what parts of an application are slower in Firefox than in Chrome

- results derived only from the Gecko Profiler / Cleopatra are not always trustable or useful

- the more profilers, the better; we seem to find slightly different things in each

Details:

[most tests measure load time]

We have test cases that do things like load up a Google Slides document, go to the end, and add a slide. Almost all of the time (4-6sec on typical machines) is in loading the document. This is true for the majority of the test cases we are looking at. We are not yet splitting out the time for each operation (to the extent that that is even possible; GC for example is going to leak into later operations, and cached results "leak" time into earlier operations because the initial operation will be expensive even though the amortized cost may be low.) This is problematic because we expect that actual users' complaints are probably more about interacting with these apps, not the initial load.

There are exceptions. In bug 1269695, the Hasal test is heavily dependent on document load, but when I profile it I do not time the initial load. Once it is loaded and the spinner has stopped, I time how long it takes to Ctrl-End to the end of the document. Bug 1330539 is looking at I-cache misses when scrolling through a document. Bug 1330252 is from piling up key events when scrolling through slides via the keyboard.

[actual user problems]

Our Hasal test cases indeed show slowdowns relative to Chrome, but we don't know whether the slowdowns for those test cases are related to the actual issues people are running into and complaining about. We are sending out surveys to try to figure this out, and adding additional telemetry to infer it from the field. The most promising pointers I've come across are multiple users sharing a document, and gradual perf degredation over time. Neither scenario is in our automated tests.

[what is slower in Fx vs Chrome]

In general with these pages (apps), we only know that it takes some number like 38% longer in Firefox than in Chrome. We still do not have a good way of breaking that down into eg "JS runs for 750ms more, layout is a little faster, GC is about the same". We would like to get to that level; we have a way to categorize time, and so does Chrome, but both have issues and are not directly comparable. Going further, we would like to be able to say that a specific script took X ms longer in Fx than in Chrome, but we don't have tooling for that either. We discussed using a proxy to instrument the JS for this purpose, and I may work on that. We can also look at other aggregate metrics for the whole load, eg cycles per instruction or cache miss rates for the whole test, but that'll never be more than a clue as to what specifically to work on.

[Unreliability/features of profiler]

We identified a number of ways in which the Gecko Profiler can mislead or omit useful information:
 - symbolication is unreliable and racy
 - symbolication is buggy (meta bug 1307215 for this and the above)
 - we cannot walk the native stacks of many of the builds in users' hands
- the sampling overhead is dependent on what is being sampled, which biases the results (eg bug 1332489, bug 1330532) - similarly, some activities during profiling take way too long (bug 1330576 and dependencies) - bailouts seem to be annotated with script names that do not match script names in the samples (bug 1329921) - the profiler associates some things, like scripts, using non-unique keys that conflate multiple things (eg bug 1329924)

Also, native profilers have various features missing from Cleopatra/Gecko Profiler that have revealed differing results. (And note that I'm not much of an expert here; I don't know enough about the various profilers on the various platforms, nor exactly what turned out to be useful for the things we looked at.)
 - perf counter data, eg cycles per retired instruction
 - data for the kernel or other processes
- butterfly view, a la Zoom on linux (or gprof, for that matter -- it's where you see all immediate callers and all immediate callees of a given function, rather than a tree view based on one or the other) - charge-to-caller (sometimes profiles get obscured by having little pieces of time whittled away into leaf functions like memset or whatever. It is handy to be able to merge this time into the caller. At least, I *think* this is what I've heard a few people request.)

The other profilers people used, or that I've heard recent mention of people using, include:
 - perf
 - Zoom (perf frontend)
 - Instruments
 - VTune
- Windows Concurrency Visualizer (I don't think anyone has used this seriously yet)

Oh, and for looking into Chrome performance (to compare with Firefox), I am a little suspicious of their categorizations of time because you only get those with the Timeline tool, which has an overhead of something like 5-7x on the test case I was looking at. (So a 6 second document load took longer than 30 seconds.) *Maybe* they're able to adjust for the tracing overhead.

[JS performance]

The general impression is that we are spending more time executing JS code than Chrome is. For the most part, this is not determined directly, but rather we look at tests where we are substantially slower than Chrome, then examine a profile to see where most of our time is going. Sometimes I'll look deeper at the percentage of time broken down by various categories (script execution, GC, layout, etc.), to compare the percentage of our time spent running scripts vs Chrome's, but see above why I'm not willing to fully trust Chrome's percentage breakdowns. I have also used our and Chrome's devtools to narrow in on specific executions of scripts (just by eyeballing the basic pattern and correlating the timelines), and from that I have seen instances where we spend much more time executing a script (and everything under it) than Chrome does.

We've really only come up with two general high-level explanations for why we might be slower.

(1) We are getting bad IPC counts for at least one test. IIRC we're at something like 10, and Chrome is around 2. Except the latest info says that that's mostly due to interference from graphics code.

(2) Google suites code is the output of Clojure Compiler, which produces functions with poor type locality, if you'll forgive me for coining a term. (As in, lots of different code with lots of different types flows into the leaf functions, so compiled code for leaves ends up megamorphic, but wouldn't be if we magically inlined everything.)

I don't know how to prove or disprove either of these hypotheses. And even if they do end up explaining a lot, I'm guessing there will be a lot of smaller reasons that'll add up as well.

Typical examples:

https://clptr.io/2jqJ0BD is a non-e10s run for the document from bug 1269695. It has 50% of the total time in stuff related to script. 19% is running ion code, 14% running baseline code, which is actually pretty good for the things I have looked at. (Contrast with an Octane run https://clptr.io/2jqRUPk where 40% of the total time is in ion, 16% in baseline, 18% in GC!)

Bug 1326346 is a gslides example that ehsan has analyzed, finding some reflow but mostly script.

https://clptr.io/2jqGWJX shows 31% of the time spent in script: 17% executing baseline, 5% executing ion, and to my eye, not very many total bailouts. (This is from loading https://docs.google.com/presentation/d/10QdoQTau98IeEp862ChQ7vG0eR3_mJZABsuduXGK1VE/edit#slide=id.g115bc93893_11_5 ).

What the JS team could provide:

- Understand what various patterns mean. What are tell-tale signs that we are doing something dumb/fisable? And which can we detect automatically? nbp has some ideas in bug 1184569. Additional perspectives would be helpful.

- In general, a better way to triage and diagnose performance problems in single page apps, a way that does not require having an analogous shell test case.

- Revive vtune integration (sstangl is working on it)

- Push column numbers through to the profiler to be able to identify specific scripts in minified code (bug 785922?)

- Annotate more things with markers, eg bug 1329923, so we can get some of the advantages of tracing with our sampling. (eg generate a timeline of samples for a given script, with all compiles/bailouts/collections annotated in the timeline)

- Produce bailout markers in a structured format (bug 1329921)

- A good description of the overall flow of the engine, as it relates to performance. Every time we go over this with someone new, the same ideas and suggestions arise.
  - compile sooner
  - compile later
  - specialize the warmup thresholds to specific URLs (or a similar key)
  - compile while idle and keep multiple differently-specialized copies
  - cache compiled code
  - cache type information
  - suppress GC
  - GC when idle
  - eliminate all or most bailouts

I can come up with reasons against every one of these, but I also feel like there's a kernel of a good idea in each. It would be nice to have ammunition in the form of a clear description of how the engine works *in practice*, and preferably specific numbers we used to tune various things. This isn't for shooting down these ideas, but rather for making them more nuanced and increasing the chance of someone coming up with something that *would* help. Without explaining the "why nots" over and over again.

Some of these things are doable, would help some, but just aren't worth the cost. Like say we saved initial type information + steady state type information separately, and unioned each with what we observe at runtime. So the first compile would take into account the first batch of type info, the second would switch to the second batch. Lots of work, more memory, more complexity -- could it be worth those costs? Seems unlikely to me, but I'm no expert.

I would imagine the engine description looking something like "x% of code never even runs, so we don't want to waste time doing anything eager with it. We have to at least syntax parse before we can fetch all of the code. x% of code only runs once, and the interpreter will execute it faster than we could baseline compile & run it. Baseline is typically k x faster than the interpreter, but has to generate ICs before it gets up to that speed. It also has to see accurate enough type info before ion compiling, so that...." or whatever. Including subtleties such as startup type info vs steady state type info. Essentially, "these are the patterns of code the engine expects to see, and this is how it makes them go fast."

I have a meta bug, bug 1299643, where I try to list the Google Suites cases that seem to be JS bound. The latest round of Hasal results is at <https://docs.google.com/spreadsheets/d/1chXshqmq2PNrGe7M6ETjLcJkkP3Myur8tPd3HmkaxYc/edit#gid=244337553>; note that I haven't gone through them to sort out the latest set of significantly slower, JS-bound cases, because they're using the old profiler and I don't have any good way of categorizing the time there.

_______________________________________________
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to