Input Delay Metric proposal

Randell Jesup Wed, 19 Sep 2018 11:45:29 -0700

Problem:
Various measures have been tried to capture user frustration with having
to wait to interact with a site they're loading (or to see the site
data).  This includes:

FID - First Input Delay --
https://developers.google.com/web/updates/2018/05/first-input-delay
TTI - Time To Interactive --
https://developers.google.com/web/fundamentals/performance/user-centric-performance-metrics#time_to_interactive
related to: FCP - First Contentful Paint and FMP - First Meaningful Paint --
https://developers.google.com/web/fundamentals/performance/user-centric-performance-metrics#first_paint_and_first_contentful_paint
TTVC (Time To Visually Complete), etc.

None of these do a great job capturing the reality around pageload and
interactivity. FID is the latest suggestion, but it's very much based
on watching user actions and reporting on them, and thus depends on how
much they think the page is ready to interact with, and dozens of other
things. It's only good for field measurements in bulk of a specific
site, by the site author. In particular, FID cannot reasonably be used
in automation (or before wide deployment).

Proposal:

We should define a new measure based on FID name MID, for Median Input
Delay, which is measurable in automation and captures the expected delay
a user experiences during a load. We can run this in automation against
a set of captured pages, while also measuring related values like FCP
and TTI, and dump this into a set of per-page graphs (perhaps on
"areweinteractiveyet.com" :-) ).

While FID depends on measuring the delay when the user *happens* to
click, MID would measure the median (etc) delay that would be
experienced at any point between (suggestion) FCP and TTI. I.e. it
would be based on "if a user input event were generated this
millisecond, how long would it be before it ran?" This would measure
delay in the input event queue (probably 0 for this case) plus the time
remaining until he current-running event for the mainthread finishes.

This inherently assumes we measure TTI and FCP (or something
approximating it). This is somewhat problematic, as TTI is very noisy.
I have a first cut at TTI measurement (fed into profiler markers) in
bug 1299118 (without the "no more than 2 connections in flight" part).

Value calculation:
Median seems to be the best measure, but once we have data we can look
at the distributions on real sites and our test harness and decide what
has the most correlation to user experience. We could also measure the
95% point, for example. In automation, there might be some advantage to
recording/reporting more data, like median and 95%, or median, average,
and 95%, and max.

Another issue with the calculation is that it won't capture burstiness
in the results well (a distribution would).

Range measured over:
We could modify the starting point to be when the first object that
could be interacted with is rendered (input object, link, adding a key
event handler, etc). This would be a more-accurate measure for web
developers, and would matter only a little for our use. Note that
getting content on the screen earlier might in some cases hurt you by
starting the measurement "early" when the MainThread is presumably busy.

Likewise, there might very well be alternatives to TTI for the end-point
(and on some pages, you never get to TTI, or it's a Long Time). Using
TTI does imply we must collect data until 5 seconds after the last "Long
Task", and since some sites will never go 5 seconds without a long
task, we'll need to upper-bound it (or progressively reduce the 5
seconds over time, which may help). Alternatively, we could use a
shorter window, or put an arbitrary limit on it (5 seconds past
'loaded', or just to 'loaded'), etc.

Issues:

Defining the start and stop point, and the details around the exact way
we calculate the result (I hand-wove about it above). Note that
"longer" endpoints will result generally in better scores, since it
would average over probably a longer tail where less is happening
(presumably). OTOH if it ends at TTI on a "Long Task" (50+ms event),
that rather implies that it was at least intermittently busy until then.

If we want to start when something interact-able is rendered, there may
be some work to figure that out.

Note that this inherently is measuring the delay until the input event
*starts* processing, not how long it takes to process (since there is no
actual input event here).

Once we have some experience with this, we could propose it for the
Performance API WG.

--
Randell Jesup, Mozilla Corp
remove "news" for personal email
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Input Delay Metric proposal

Reply via email to