[Analytics] How best to accurately record page interactions in Page Previews

Tilman Bayer Thu, 08 Feb 2018 01:52:13 -0800

Hi Leila,

On Wed, Jan 17, 2018 at 10:46 AM, Leila Zia <le...@wikimedia.org> wrote:

> Hi Sam,
>
> On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith <samsm...@wikimedia.org> wrote:
>
> > IMO #1 is preferable from the operations and performance perspectives as
> the
> > response is always served from the edge and includes very few headers,
> > whereas the request in #2 may be served by the application servers if the
> > user is logged in (or in the mobile site's beta cohort). However, the
> > requests in #2 are already
>
> It seems the sentence above is cut, can you resend it?
>
> > We're currently considering recording page interactions when previews are
> > open for longer than 1000 ms. We estimate that this would increase
> overall
> > web requests by 0.3% [3].
>
> Can you say some words about how the 1000 ms threshold is chosen?

This is a good question, sorry that it got buried earlier. (It's kind of
orthogonal though to the technical instrumentation questions that have been
the focus of attention: as indicated by the capital X in Sam's initial
post, we can still decide to fine-tune that threshold right now, it's just
a parameter change.)

This kind of threshold necessarily needs to be set somewhat arbitrarily, in
the sense that there will always be either cases where some content was
already read/perceived in a preview card shown for a shorter time, or cases
where a reader needed a longer time to consume any content from the card.
We picked a time by which we can be reasonably certain that at least some
readers can consume content (read some words, perceive an image). It's not
the result of an exact calculation to find the provably best limit. But we
did have look at the frequency of the different user actions over time
during the first seconds after they start to hover over a link. In case
you're interested, I recently updated those charts with better quality data
from our latest two tests, e.g:
https://phabricator.wikimedia.org/F12940888
https://phabricator.wikimedia.org/F13134460 (a zoomed-in look at the same
histogram)

The following is just eyeballing and thinking aloud, but one way to view
this histogram is as the sum of several distributions associated with
different user intentions:
1. Most of the time when our instrumentation registered the cursor moving
over a link, the user was just on their way to a different part of the
screen (with no intention of either clicking that link or viewing the
preview). That's mostly the huge yellow spike on the left -
"dwelledButAbandoned" meaning that the cursor left the link without either
clicking it or causing a preview to show. The feature involves a 500ms
delay before the preview card begins to display, so that we don't bother
that group too much. (Only the right tail end of that distribution, folks
moving the cursor very slowly, will be affected, where things morph from
yellow into purple.)
2. Then there are users who want to click the link without viewing the
preview, forming all of the green part left of 500ms and an unknown portion
to the right of it (after the card starts to show, some of these "open"
actions will instead happen after the user intentionally viewed the card,
case 3.).
3. And there are users who intentionally view a preview. The little bump in
the purple part ("dismissed" meaning that the preview was shown and then
closed by moving the cursor away) at about 1100ms indicates that the
distribution for that user group also peaks somewhere there, maybe a few
100ms to the right. That would mean that our 1000ms threshold (i.e. only
counting the part of the histogram right of 1500ms = 500ms + 1000ms as seen
previews) is actually right of that distribution's peak. I.e. that the
threshold is in some sense quite conservative.

Like I said, this is all of course still a bit handwavy; it involves some
assumptions about the form of these distributions, as well as disregarding
some other information for now that can give a fuller picture (in
particular the analogous histogram for link interaction behavior without
page previews being active, which we also have from our A/B tests).

> Is
> this based (partially) on looking at traces where a user-agent goes to
> a page and returns to the "source" article?
>
We did an analysis of that user behavior, but not regarding the timing
question; rather, it was about finding out how much of the reduction in
pageviews comes from reduced usage of the back button. I'm not sure how
directly we can compare the action of loading an entire new page and then
going back (two clicks that also involve moving the mouse cursor to an
entirely different part of the screen - the back button - inbetween) with
the action of hovering over a link and then moving the cursor away for a
small distance to close the preview; it seems to me that the latter
involves much less friction - which is kind of the whole point of the
previews feature ;)

As indicated, we already picked a value for the threshold that we are quite
comfortable with. But if you are still interested in this question and have
some spare time, I'm more than happy to chat about it further off-list.

> Thanks,
> Leila
>
> >
> > [0] https://lists.wikimedia.org/pipermail/analytics/2015-March/0
> 03633.html
> > [1]
> > https://phabricator.wikimedia.org/source/operations-puppet/b
> rowse/production/modules/varnish/templates/vcl/wikimedia-fro
> ntend.vcl.erb;1bce79d58e03bd02888beef986c41989e8345037$269
> > [2] https://wikitech.wikimedia.org/wiki/X-Analytics
> > [3] https://phabricator.wikimedia.org/T184793#3901365
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

-- 
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] How best to accurately record page interactions in Page Previews

Reply via email to