In some cases we have page_id, in other cases we have nothing (like API
requests)

On Wed, Aug 19, 2015 at 2:13 PM, Oliver Keyes <[email protected]> wrote:

> Aren't we currently just storing pageID?
>
> On 19 August 2015 at 14:11, Dan Andreescu <[email protected]>
> wrote:
> > Oliver, the problem with "page_title OR page_id" instead of "always
> > page_title and page_id if you have it" is what Andrew was addressing
> above.
> > It means we have to query for page_title by id, and that means we need to
> > keep an up-to-date copy of all mediawiki databases.  And we have to be
> able
> > to query that copy tens of thousands of times per second, which is
> basically
> > not going to happen.
> >
> > We just chatted in scrum of scrums about this, it looks like Adam's
> going to
> > set up a meeting so we can talk more there.  I agree with Adam that we
> have
> > to have a short term solution for counting the new kinds of requests.  A
> > medium term solution so that we don't all go insane, and something to
> shoot
> > for in the long term.
> >
> > On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes <[email protected]>
> wrote:
> >>
> >> In the absence of all clients doing it, "if it has this x_analytics
> >> entry, don't bother with the complex regular expressions, if it
> >> doesn't, do" still works.
> >>
> >> On 19 August 2015 at 13:34, Gabriel Wicke <[email protected]> wrote:
> >> > Yeah, doing this on the client could work, but would require *all*
> >> > clients
> >> > to actually do it. We also have metrics per entry point in RESTBase,
> but
> >> > those are behind Varnishes and will only count Varnish cache misses.
> >> > Without
> >> > Varnish caching, this would be a solved problem ;)
> >> >
> >> > On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu
> >> > <[email protected]>
> >> > wrote:
> >> >>
> >> >> This (making pageviews proactive) is a great idea, and we should
> follow
> >> >> through.  Here's a simple start:
> >> >>
> >> >> If your app/site/etc. is creating a request that it wants to count
> as a
> >> >> pageview, add an X-Analytics header with pageview_id=<page_id> or
> >> >> pageview_title=<page_title>
> >> >>
> >> >> If we can make this change uniformly, I think we'd be in a very good
> >> >> place.
> >> >>
> >> >> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <[email protected]
> >
> >> >> wrote:
> >> >>>
> >> >>> On 19 August 2015 at 10:19, Andrew Otto <[email protected]>
> wrote:
> >> >>> >>  If we /do/ include RESTBase requests we will not only have to
> >> >>> >> rewrite the pageview definition for the apps to recognise the new
> >> >>> >> URL
> >> >>> >> scheme
> >> >>> >
> >> >>> > I really think that apps and APIs should do something proactive to
> >> >>> > tag
> >> >>> > or log a pageview.  With more ways of viewing content, it is going
> >> >>> > to get
> >> >>> > harder and harder to maintain a pattern based definition.  A
> >> >>> > pageview should
> >> >>> > be an event that is logged, not something that is pattern matched
> >> >>> > out of a
> >> >>> > very noisy stream of data.
> >> >>> >
> >> >>> > Most mediawiki requests do this now, via the page_id field in the
> >> >>> > X-Analytlics header, but we can’t use this for all pageviews
> because
> >> >>> > APIs
> >> >>> > are more complicated (e.g. more than one page can be served in a
> >> >>> > single
> >> >>> > request, etc.).  In the longterm, there should be a pageview event
> >> >>> > stream
> >> >>> > just like rcstream! :)
> >> >>>
> >> >>> This is an excellent point. IIRC we'd been asking Apps to do this
> for
> >> >>> kind of a while, so...
> >> >>>
> >> >>> >
> >> >>> > -Ao
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >> On Aug 18, 2015, at 19:58, Oliver Keyes <[email protected]>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >> On 18 August 2015 at 19:11, Bernd Sitzmann <[email protected]>
> >> >>> >> wrote:
> >> >>> >>> This discussion is about needed updates of the definition and
> >> >>> >>> Analytics
> >> >>> >>> implementation for mobile apps page view metrics. There is also
> an
> >> >>> >>> associated Phab task[4]. Please add the proper Analytics project
> >> >>> >>> there.
> >> >>> >>>
> >> >>> >>> Background / Changes
> >> >>> >>>
> >> >>> >>> As you probably remember, the Android app splits a page view
> into
> >> >>> >>> two
> >> >>> >>> requests: one for the lead section and metadata, plus another
> one
> >> >>> >>> for
> >> >>> >>> the
> >> >>> >>> remainder.
> >> >>> >>>
> >> >>> >>> The mobile apps are going to change the way they load pages in
> two
> >> >>> >>> different
> >> >>> >>> ways:
> >> >>> >>>
> >> >>> >>> We'll add a link preview when someone clicks on a link from a
> >> >>> >>> page.
> >> >>> >>> We're planning on switching over the using RESTBase for loading
> >> >>> >>> pages
> >> >>> >>> and
> >> >>> >>> also the link preview (initially just the Android beta, ater
> more)
> >> >>> >>>
> >> >>> >>
> >> >>> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's
> RESTful
> >> >>> >> service API?
> >> >>> >>
> >> >>> >> Last time I checked that wasn't even consumed by HDFS. Is it now
> >> >>> >> being
> >> >>> >> consumed by HDFS?
> >> >>> >>
> >> >>> >> More importantly the actual URLs are going to look /totally/
> >> >>> >> different. If we do not include RESTBase requests, we will miss
> the
> >> >>> >> apps. If we /do/ include RESTBase requests we will not only have
> to
> >> >>> >> rewrite the pageview definition for the apps to recognise the new
> >> >>> >> URL
> >> >>> >> scheme, we will also potentially have to rewrite every /other/
> bit
> >> >>> >> of
> >> >>> >> the definition to /not/ incorporate those requests.
> >> >>> >>
> >> >>> >> (I use "we" in a collective sense. This isn't my baby any more,
> >> >>> >> although if Joseph et al want help with the refactor here I'm
> happy
> >> >>> >> to
> >> >>> >> spend my volunteer time on it).
> >> >>> >>
> >> >>> >> But basically every other bit of your email is important but now
> >> >>> >> secondary: this is a potentially massive change, all on its own,
> >> >>> >> even
> >> >>> >> without the link preview, even if the substance of the requests
> >> >>> >> going
> >> >>> >> to RESTBase were identical.
> >> >>> >>
> >> >>> >>> This will have implications for the pageviews definition and how
> >> >>> >>> we
> >> >>> >>> count
> >> >>> >>> user engagement.
> >> >>> >>>
> >> >>> >>> The big question is
> >> >>> >>>
> >> >>> >>> Should we count link previews as a page view since it's an
> >> >>> >>> indication
> >> >>> >>> of
> >> >>> >>> user engagement? Or should there be a separate metric for link
> >> >>> >>> previews?
> >> >>> >>>
> >> >>> >>> Counting page views
> >> >>> >>>
> >> >>> >>> IIRC we currently count action=mobileview&sections=0 query
> >> >>> >>> parameters
> >> >>> >>> of
> >> >>> >>> api.php as a page view. When we publish link previews for all
> >> >>> >>> Android
> >> >>> >>> app
> >> >>> >>> users then we would either want to count also the calls to
> >> >>> >>> action=query&prop=extracts as a page view or add them to another
> >> >>> >>> metric.
> >> >>> >>>
> >> >>> >>> Once the apps use RESTBase the HTTPS requests will be very
> >> >>> >>> different:
> >> >>> >>>
> >> >>> >>> Page view: Instead of action=mobileview&sections=0 the app would
> >> >>> >>> call
> >> >>> >>> the
> >> >>> >>> RESTBase endpoint for lead request[1] instead of the PHP API
> >> >>> >>> mentioned
> >> >>> >>> above. Then it would call [2].
> >> >>> >>> Link preview: Instead of action=query&prop=extracts it would
> call
> >> >>> >>> the
> >> >>> >>> lead
> >> >>> >>> request[1], too, since there is a lot of overlap. At least that
> >> >>> >>> our
> >> >>> >>> current
> >> >>> >>> plan. The advantage of that is that the client doesn't need to
> >> >>> >>> execute the
> >> >>> >>> lead request a second time if the user clicks on the link
> preview
> >> >>> >>> (--
> >> >>> >>> either
> >> >>> >>> through caching or app logic.)
> >> >>> >>>
> >> >>> >>> So, in the RESTBase case we either want to count the
> >> >>> >>> mobile-html-sections-lead requests or the
> >> >>> >>> mobile-html-sections-remaining
> >> >>> >>> requests depending on what our definition for page views
> actually
> >> >>> >>> is.
> >> >>> >>> We
> >> >>> >>> could also add a query parameter or extra HTTP header to one of
> >> >>> >>> the
> >> >>> >>> mobile-html-sections-lead requests if we need to distinguish
> >> >>> >>> between
> >> >>> >>> previews and page views.
> >> >>> >>>
> >> >>> >>> Both the current PHP API and the RESTBase based metrics would
> need
> >> >>> >>> to
> >> >>> >>> be
> >> >>> >>> compatible and be collected in parallel since we cannot control
> >> >>> >>> when
> >> >>> >>> users
> >> >>> >>> update their apps.
> >> >>> >>>
> >> >>> >>> [1]
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
> >> >>> >>> [2]
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert
> >> >>> >>> [3]
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
> >> >>> >>>
> >> >>> >>> [4] https://phabricator.wikimedia.org/T109383
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Cheers,
> >> >>> >>>
> >> >>> >>> Bernd
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> _______________________________________________
> >> >>> >>> Analytics mailing list
> >> >>> >>> [email protected]
> >> >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> Oliver Keyes
> >> >>> >> Count Logula
> >> >>> >> Wikimedia Foundation
> >> >>> >>
> >> >>> >> _______________________________________________
> >> >>> >> Analytics mailing list
> >> >>> >> [email protected]
> >> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>> >
> >> >>> >
> >> >>> > _______________________________________________
> >> >>> > Analytics mailing list
> >> >>> > [email protected]
> >> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Oliver Keyes
> >> >>> Count Logula
> >> >>> Wikimedia Foundation
> >> >>>
> >> >>> _______________________________________________
> >> >>> Analytics mailing list
> >> >>> [email protected]
> >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Analytics mailing list
> >> >> [email protected]
> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Gabriel Wicke
> >> > Principal Engineer, Wikimedia Foundation
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > [email protected]
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Count Logula
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to