Re: [Analytics] [Wikimedia-search-private] Search dashboards are now running on live data

Adam Baso Wed, 27 May 2015 12:34:06 -0700

Thought I'd step in here. People who know the mechanics of the relevant
logging are as follows:


Android: Dmitry Brant
Desktop: Bahodir (Baha) Mansurov
Mobile Web: Sam Smith & Baha

CC'ing them.

As I understand, Dan Garry's been talking with Baha already on the desktop
piece.

It looks like on Chrome/42 UAs the clickthrough rate for a given suggest
search form interaction is about 40%. [1]

A couple of patches are pending (JS for emitting event on new EL schema)
that will make it possible to figure out how often form submission within a
form interaction (ENTER/RETURN, tapping the magnifying class) occurs as
well.

Total form interactions (keys on userSessionToken...maybe a better name
could be used) minus clickthroughs (click-result) minus form submission
(submit-form in the pending patches on the new EL schema) would be a rough
proxy of abandonment, I think.

I had heard sendBeacon capable UAs were likely to have greater success
emitting the click-result (i.e., when user clicks on a suggestion from the
form on the search panel on desktop) event via mw.track, so it may make
sense to confine queries for such analysis on desktop to known sendBeacon
browsers [2] to increase the odds of high fidelity data just in case there
are outlier browsers that manage to somehow emit click-result events
through means other than sendBeacon (it seems there may be some of these,
assuming non-forged UAs).

-Adam

[1]

> SELECT count(*) FROM Search_11670541 WHERE timestamp >= '20150526' AND
timestamp < '20150527' AND event_action = 'click-result' AND wiki =
'enwiki' and userAgent LIKE '%Chrome/42%';
+----------+
| count(*) |
+----------+
|      112 |
+----------+
1 row in set (2.96 sec)

> SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE
timestamp >= '20150526' AND timestamp < '20150527' AND event_action =
'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------------------------------------+
| count(DISTINCT event_userSessionToken) |
+----------------------------------------+
|                                    112 |
+----------------------------------------+
1 row in set (38.06 sec)


> SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE
timestamp > '20150526' AND timestamp < '20150527' AND wiki = 'enwiki' and
userAgent LIKE '%Chrome/42%';
+----------------------------------------+
| count(DISTINCT event_userSessionToken) |
+----------------------------------------+
|                                    286 |
+----------------------------------------+
1 row in set (7.26 sec)


[2] https://developer.mozilla.org/it/docs/Web/API/Navigator/sendBeacon

On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes <oke...@wikimedia.org> wrote:

> Thanks Tomasz; great feedback! In order:
>
> * yeah, top percentiles were a heavily-requested thing so I built it
> in from the get-go. Similarly, mean/median so we have some ability to
> avoid distorting the results when the distribution changes.
> * The 3 days data thing is a known -
> https://phabricator.wikimedia.org/T100056 - and is next on my to-do
> list for bugfixes :).
> * Glad you like the interface! It's actually functional on mobile, too :D.
> * Sample rate is crucial, yep. I'm reaching out to the authors of the
> relevant EL schemas to find out how each was handled.
> * Sessions < results opened makes sense in the event that users didn't
> find what they wanted and went back to try again, but I'm not sure how
> "session" is calculated; this is again something we lack transparency
> around :(. Dan? You're the apps wizard.
>
> In supporting this: probably nothing at the moment although Nik/Kevin
> chipping in on the relevant phabricator ticket
> (https://phabricator.wikimedia.org/T99762 ) to validate how much of a
> PITA the idea of a unified schema and the associated implementations
> are, would be good.
>
> I'm sort of shocked to hear "we're supposed to be presenting this data
> at the next metrics meeting": in the future if there are instances
> where data is going to be up for public scrutiny, would it be possible
> to explicitly associate time for that? My goal is to get us to the
> point where our data is reliable all, or at least, most of the time,
> and for a fragment of one person's time over two weeks, I think
> progress on that is pretty fantastic. But prepping data for that kind
> of event does change the priorities and what tasks should be worked
> on.
>
> If we want to present data, generally speaking, let's discuss what we
> can show off. If we want to present the dashboards I'll put my all
> into making the data at least something where we know the
> deficiencies, if not something where we consider the deficiencies
> tolerable.
>
> On 26 May 2015 at 19:24, Tomasz Finc <tf...@wikimedia.org> wrote:
> > Thanks Oliver
> >
> > Early observations
> >
> > * Really happy to see top percentiles in load graphs
> > * Mobile Web has only three days data
> > * Interface is simple and easy to use
> > * We need to know the sample rate
> > * Apps have fewer sessions than results page opened
> >
> > Speaking over IRC it's clear that we don't have confidence in this
> > data. We need to fix this and fix it quickly so that we can accurately
> > plan our work. We're supposed to be presenting this data at the next
> > metrics meeting and we're not a point where I feel comfortable sharing
> > our data let alone next steps.
> >
> > Oliver & Dan, what can the team do to support you guys on this? I want
> > you guys to own this and know that were here to support you.
> >
> > Should I be adding new feature requests and bugs to
> > https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
> >
> > --tomasz
> >
> > On Tue, May 26, 2015 at 11:04 AM, James Douglas <jdoug...@wikimedia.org>
> wrote:
> >> This is a very exciting preview of things to come.
> >>
> >> Where are the data coming from?  Am I just confused, or does "6 search
> >> sessions per day" seem low?
> >>
> >> On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes <oke...@wikimedia.org>
> wrote:
> >>>
> >>> http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously
> >>> we were playing around with them and testing what we needed with a
> >>> static snapshot; these dashboards will now update once a day with new
> >>> information.
> >>>
> >>> It has turned up some bugs ("is the mobile schema just not running?")
> >>> and there are more metrics to add. But for the time being, is progress
> >>> :)
> >>>
> >>> --
> >>> Oliver Keyes
> >>> Research Analyst
> >>> Wikimedia Foundation
> >>>
> >>> _______________________________________________
> >>> Wikimedia-search-private mailing list
> >>> wikimedia-search-priv...@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
> >>
> >>
> >>
> >> _______________________________________________
> >> Wikimedia-search-private mailing list
> >> wikimedia-search-priv...@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
> >>
> >
> > _______________________________________________
> > Wikimedia-search-private mailing list
> > wikimedia-search-priv...@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Wikimedia-search-private] Search dashboards are now running on live data

Reply via email to