[JS-internals] Report on Google Suites work week in Taipei

Steve Fink Tue, 24 Jan 2017 21:48:00 -0800

Hello #jsapi.

I wanted to report on some of the JS-centric results of the GoogleSuites work week in Taipei, and hopefully get more eyes on these problems.

First, some background: Sean and I met up with about two dozen otherpeople involved in the Google Suites performance improvement effort. Wehad the Hasal people from Taipei, a bunch of project manager type folks,and a scattering of developers from across Gecko. The GSuites effort isconsidered extremely high priority, for good reasons that I won't gointo here. In Taipei, we looked at lots of profiles, worked on internaltools, argued over what to test and how to measure, and ate stinky tofu.


High level findings:

- we have a cluster of test cases that run around 20% slower thanChrome, with some quite a bit worse than that (think 40-60%, withoutliers at 2x)


 - most of the existing test cases appear to be JS bound

 - some other areas showed up here and there, eg layout and GC

 - most of our current test cases boil down to document load times

 - we don't have much visibility into what actual users are hitting

- we still don't have a great way to say exactly what parts of anapplication are slower in Firefox than in Chrome

- results derived only from the Gecko Profiler / Cleopatra are notalways trustable or useful

- the more profilers, the better; we seem to find slightly differentthings in each


Details:

[most tests measure load time]

We have test cases that do things like load up a Google Slides document,go to the end, and add a slide. Almost all of the time (4-6sec ontypical machines) is in loading the document. This is true for themajority of the test cases we are looking at. We are not yet splittingout the time for each operation (to the extent that that is evenpossible; GC for example is going to leak into later operations, andcached results "leak" time into earlier operations because the initialoperation will be expensive even though the amortized cost may be low.)This is problematic because we expect that actual users' complaints areprobably more about interacting with these apps, not the initial load.

There are exceptions. In bug 1269695, the Hasal test is heavilydependent on document load, but when I profile it I do not time theinitial load. Once it is loaded and the spinner has stopped, I time howlong it takes to Ctrl-End to the end of the document. Bug 1330539 islooking at I-cache misses when scrolling through a document. Bug 1330252is from piling up key events when scrolling through slides via the keyboard.


[actual user problems]

Our Hasal test cases indeed show slowdowns relative to Chrome, but wedon't know whether the slowdowns for those test cases are related to theactual issues people are running into and complaining about. We aresending out surveys to try to figure this out, and adding additionaltelemetry to infer it from the field. The most promising pointers I'vecome across are multiple users sharing a document, and gradual perfdegredation over time. Neither scenario is in our automated tests.


[what is slower in Fx vs Chrome]

In general with these pages (apps), we only know that it takes somenumber like 38% longer in Firefox than in Chrome. We still do not have agood way of breaking that down into eg "JS runs for 750ms more, layoutis a little faster, GC is about the same". We would like to get to thatlevel; we have a way to categorize time, and so does Chrome, but bothhave issues and are not directly comparable. Going further, we wouldlike to be able to say that a specific script took X ms longer in Fxthan in Chrome, but we don't have tooling for that either. We discussedusing a proxy to instrument the JS for this purpose, and I may work onthat. We can also look at other aggregate metrics for the whole load, egcycles per instruction or cache miss rates for the whole test, butthat'll never be more than a clue as to what specifically to work on.


[Unreliability/features of profiler]

We identified a number of ways in which the Gecko Profiler can misleador omit useful information:

 - symbolication is unreliable and racy
 - symbolication is buggy (meta bug 1307215 for this and the above)
 - we cannot walk the native stacks of many of the builds in users' hands

- the sampling overhead is dependent on what is being sampled, whichbiases the results (eg bug 1332489, bug 1330532)- similarly, some activities during profiling take way too long (bug1330576 and dependencies)- bailouts seem to be annotated with script names that do not matchscript names in the samples (bug 1329921)- the profiler associates some things, like scripts, using non-uniquekeys that conflate multiple things (eg bug 1329924)

Also, native profilers have various features missing fromCleopatra/Gecko Profiler that have revealed differing results. (And notethat I'm not much of an expert here; I don't know enough about thevarious profilers on the various platforms, nor exactly what turned outto be useful for the things we looked at.)

 - perf counter data, eg cycles per retired instruction
 - data for the kernel or other processes

- butterfly view, a la Zoom on linux (or gprof, for that matter --it's where you see all immediate callers and all immediate callees of agiven function, rather than a tree view based on one or the other)- charge-to-caller (sometimes profiles get obscured by having littlepieces of time whittled away into leaf functions like memset orwhatever. It is handy to be able to merge this time into the caller. Atleast, I *think* this is what I've heard a few people request.)

The other profilers people used, or that I've heard recent mention ofpeople using, include:

 - perf
 - Zoom (perf frontend)
 - Instruments
 - VTune

- Windows Concurrency Visualizer (I don't think anyone has used thisseriously yet)

Oh, and for looking into Chrome performance (to compare with Firefox), Iam a little suspicious of their categorizations of time because you onlyget those with the Timeline tool, which has an overhead of somethinglike 5-7x on the test case I was looking at. (So a 6 second documentload took longer than 30 seconds.) *Maybe* they're able to adjust forthe tracing overhead.


[JS performance]

The general impression is that we are spending more time executing JScode than Chrome is. For the most part, this is not determined directly,but rather we look at tests where we are substantially slower thanChrome, then examine a profile to see where most of our time is going.Sometimes I'll look deeper at the percentage of time broken down byvarious categories (script execution, GC, layout, etc.), to compare thepercentage of our time spent running scripts vs Chrome's, but see abovewhy I'm not willing to fully trust Chrome's percentage breakdowns. Ihave also used our and Chrome's devtools to narrow in on specificexecutions of scripts (just by eyeballing the basic pattern andcorrelating the timelines), and from that I have seen instances where wespend much more time executing a script (and everything under it) thanChrome does.

We've really only come up with two general high-level explanations forwhy we might be slower.

(1) We are getting bad IPC counts for at least one test. IIRC we're atsomething like 10, and Chrome is around 2. Except the latest info saysthat that's mostly due to interference from graphics code.

(2) Google suites code is the output of Clojure Compiler, which producesfunctions with poor type locality, if you'll forgive me for coining aterm. (As in, lots of different code with lots of different types flowsinto the leaf functions, so compiled code for leaves ends upmegamorphic, but wouldn't be if we magically inlined everything.)

I don't know how to prove or disprove either of these hypotheses. Andeven if they do end up explaining a lot, I'm guessing there will be alot of smaller reasons that'll add up as well.


Typical examples:

https://clptr.io/2jqJ0BD is a non-e10s run for the document from bug1269695. It has 50% of the total time in stuff related to script. 19% isrunning ion code, 14% running baseline code, which is actually prettygood for the things I have looked at. (Contrast with an Octane runhttps://clptr.io/2jqRUPk where 40% of the total time is in ion, 16% inbaseline, 18% in GC!)

Bug 1326346 is a gslides example that ehsan has analyzed, finding somereflow but mostly script.

https://clptr.io/2jqGWJX shows 31% of the time spent in script: 17%executing baseline, 5% executing ion, and to my eye, not very many totalbailouts. (This is from loadinghttps://docs.google.com/presentation/d/10QdoQTau98IeEp862ChQ7vG0eR3_mJZABsuduXGK1VE/edit#slide=id.g115bc93893_11_5).


What the JS team could provide:

- Understand what various patterns mean. What are tell-tale signs thatwe are doing something dumb/fisable? And which can we detectautomatically? nbp has some ideas in bug 1184569. Additionalperspectives would be helpful.

- In general, a better way to triage and diagnose performance problemsin single page apps, a way that does not require having an analogousshell test case.


- Revive vtune integration (sstangl is working on it)

- Push column numbers through to the profiler to be able to identifyspecific scripts in minified code (bug 785922?)

- Annotate more things with markers, eg bug 1329923, so we can get someof the advantages of tracing with our sampling. (eg generate a timelineof samples for a given script, with all compiles/bailouts/collectionsannotated in the timeline)


- Produce bailout markers in a structured format (bug 1329921)

- A good description of the overall flow of the engine, as it relates toperformance. Every time we go over this with someone new, the same ideasand suggestions arise.

  - compile sooner
  - compile later
  - specialize the warmup thresholds to specific URLs (or a similar key)
  - compile while idle and keep multiple differently-specialized copies
  - cache compiled code
  - cache type information
  - suppress GC
  - GC when idle
  - eliminate all or most bailouts

I can come up with reasons against every one of these, but I also feellike there's a kernel of a good idea in each. It would be nice to haveammunition in the form of a clear description of how the engine works*in practice*, and preferably specific numbers we used to tune variousthings. This isn't for shooting down these ideas, but rather for makingthem more nuanced and increasing the chance of someone coming up withsomething that *would* help. Without explaining the "why nots" over andover again.

Some of these things are doable, would help some, but just aren't worththe cost. Like say we saved initial type information + steady state typeinformation separately, and unioned each with what we observe atruntime. So the first compile would take into account the first batch oftype info, the second would switch to the second batch. Lots of work,more memory, more complexity -- could it be worth those costs? Seemsunlikely to me, but I'm no expert.

I would imagine the engine description looking something like "x% ofcode never even runs, so we don't want to waste time doing anythingeager with it. We have to at least syntax parse before we can fetch allof the code. x% of code only runs once, and the interpreter will executeit faster than we could baseline compile & run it. Baseline is typicallyk x faster than the interpreter, but has to generate ICs before it getsup to that speed. It also has to see accurate enough type info beforeion compiling, so that...." or whatever. Including subtleties such asstartup type info vs steady state type info. Essentially, "these are thepatterns of code the engine expects to see, and this is how it makesthem go fast."

I have a meta bug, bug 1299643, where I try to list the Google Suitescases that seem to be JS bound. The latest round of Hasal results is at<https://docs.google.com/spreadsheets/d/1chXshqmq2PNrGe7M6ETjLcJkkP3Myur8tPd3HmkaxYc/edit#gid=244337553>;note that I haven't gone through them to sort out the latest set ofsignificantly slower, JS-bound cases, because they're using the oldprofiler and I don't have any good way of categorizing the time there.


_______________________________________________
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

[JS-internals] Report on Google Suites work week in Taipei

Reply via email to