It's not that challenging; Aaron and I developed a fairly robust way
of doing it that Mikhail and I are refining. It's just not easy to do
without, say, a dedicated EL schema that somebody (probably
readership?) would own and surface data from.

On 18 September 2015 at 13:14, Gabriel Wicke <[email protected]> wrote:
> This discussion also reminds me of the idea of tracking time spent on site.
> Arguably, that's a more relevant measurement for how much of our content
> people actually consume, and it also neatly side-steps issues like the
> categorization of link previews. I realize that measuring that accurately
> can be challenging, but I think it'll become more and more important as we
> venture into more dynamic content experiences.
>
>
> On Thu, Sep 17, 2015 at 8:17 AM, Oliver Keyes <[email protected]> wrote:
>>
>> Danke!
>>
>> On 17 September 2015 at 11:15, Nuria Ruiz <[email protected]> wrote:
>> > Right! Thanks for pointing that out.
>> >
>> > I think I have updated all docs now:
>> > https://meta.wikimedia.org/wiki/Research:Page_view#Change_log
>> >
>> > https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
>> >
>> > On Thu, Sep 17, 2015 at 7:36 AM, Oliver Keyes <[email protected]>
>> > wrote:
>> >>
>> >> Have those changes been noted on the main pageview definition page and
>> >> associated changelog?
>> >>
>> >> On 17 September 2015 at 09:58, Nuria Ruiz <[email protected]> wrote:
>> >> >>With more ways of viewing content, it is going to get harder and
>> >> >> harder
>> >> >> to
>> >> >> maintain a pattern based definition.
>> >> > Indeed, we want to move away from pattern based definition as mach as
>> >> > possible.
>> >> >
>> >> > This is an FYI to everyone that with our latest changes (that we are
>> >> > in
>> >> > the
>> >> > process of deploying today) if a request comes "tagged" with
>> >> > "preview"
>> >> > in
>> >> > the x-analytics header it will not be counted towards a pageviews.
>> >> > The
>> >> > Android App should do corresponding changes to add the tag "preview"
>> >> > to
>> >> > its
>> >> > preview requests.
>> >> >
>> >> > X-analytics header is documented here:
>> >> > https://wikitech.wikimedia.org/wiki/X-Analytics
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> >  If we /do/ include RESTBase requests we will not only have to
>> >> >> > rewrite the pageview definition for the apps to recognise the new
>> >> >> > URL
>> >> >> > scheme
>> >> >>
>> >> >> I really think that apps and APIs should do something proactive to
>> >> >> tag
>> >> >> or
>> >> >> log a pageview.  With more ways of viewing content, it is going to
>> >> >> get
>> >> >> harder and harder to maintain a pattern based definition.  A
>> >> >> pageview
>> >> >> should
>> >> >> be an event that is logged, not something that is pattern matched
>> >> >> out
>> >> >> of a
>> >> >> very noisy stream of data.
>> >> >>
>> >> >> Most mediawiki requests do this now, via the page_id field in the
>> >> >> X-Analytlics header, but we can’t use this for all pageviews because
>> >> >> APIs
>> >> >> are more complicated (e.g. more than one page can be served in a
>> >> >> single
>> >> >> request, etc.).  In the longterm, there should be a pageview event
>> >> >> stream
>> >> >> just like rcstream! :)
>> >> >>
>> >> >> -Ao
>> >> >>
>> >> >>
>> >> >>
>> >> >> > On Aug 18, 2015, at 19:58, Oliver Keyes <[email protected]>
>> >> >> > wrote:
>> >> >> >
>> >> >> > On 18 August 2015 at 19:11, Bernd Sitzmann <[email protected]>
>> >> >> > wrote:
>> >> >> >> This discussion is about needed updates of the definition and
>> >> >> >> Analytics
>> >> >> >> implementation for mobile apps page view metrics. There is also
>> >> >> >> an
>> >> >> >> associated Phab task[4]. Please add the proper Analytics project
>> >> >> >> there.
>> >> >> >>
>> >> >> >> Background / Changes
>> >> >> >>
>> >> >> >> As you probably remember, the Android app splits a page view into
>> >> >> >> two
>> >> >> >> requests: one for the lead section and metadata, plus another one
>> >> >> >> for
>> >> >> >> the
>> >> >> >> remainder.
>> >> >> >>
>> >> >> >> The mobile apps are going to change the way they load pages in
>> >> >> >> two
>> >> >> >> different
>> >> >> >> ways:
>> >> >> >>
>> >> >> >> We'll add a link preview when someone clicks on a link from a
>> >> >> >> page.
>> >> >> >> We're planning on switching over the using RESTBase for loading
>> >> >> >> pages
>> >> >> >> and
>> >> >> >> also the link preview (initially just the Android beta, ater
>> >> >> >> more)
>> >> >> >>
>> >> >> >
>> >> >> > Woah woah woah woah woah. By RESTBase do you mean Gabriel's
>> >> >> > RESTful
>> >> >> > service API?
>> >> >> >
>> >> >> > Last time I checked that wasn't even consumed by HDFS. Is it now
>> >> >> > being
>> >> >> > consumed by HDFS?
>> >> >> >
>> >> >> > More importantly the actual URLs are going to look /totally/
>> >> >> > different. If we do not include RESTBase requests, we will miss
>> >> >> > the
>> >> >> > apps. If we /do/ include RESTBase requests we will not only have
>> >> >> > to
>> >> >> > rewrite the pageview definition for the apps to recognise the new
>> >> >> > URL
>> >> >> > scheme, we will also potentially have to rewrite every /other/ bit
>> >> >> > of
>> >> >> > the definition to /not/ incorporate those requests.
>> >> >> >
>> >> >> > (I use "we" in a collective sense. This isn't my baby any more,
>> >> >> > although if Joseph et al want help with the refactor here I'm
>> >> >> > happy
>> >> >> > to
>> >> >> > spend my volunteer time on it).
>> >> >> >
>> >> >> > But basically every other bit of your email is important but now
>> >> >> > secondary: this is a potentially massive change, all on its own,
>> >> >> > even
>> >> >> > without the link preview, even if the substance of the requests
>> >> >> > going
>> >> >> > to RESTBase were identical.
>> >> >> >
>> >> >> >> This will have implications for the pageviews definition and how
>> >> >> >> we
>> >> >> >> count
>> >> >> >> user engagement.
>> >> >> >>
>> >> >> >> The big question is
>> >> >> >>
>> >> >> >> Should we count link previews as a page view since it's an
>> >> >> >> indication
>> >> >> >> of
>> >> >> >> user engagement? Or should there be a separate metric for link
>> >> >> >> previews?
>> >> >> >>
>> >> >> >> Counting page views
>> >> >> >>
>> >> >> >> IIRC we currently count action=mobileview&sections=0 query
>> >> >> >> parameters
>> >> >> >> of
>> >> >> >> api.php as a page view. When we publish link previews for all
>> >> >> >> Android
>> >> >> >> app
>> >> >> >> users then we would either want to count also the calls to
>> >> >> >> action=query&prop=extracts as a page view or add them to another
>> >> >> >> metric.
>> >> >> >>
>> >> >> >> Once the apps use RESTBase the HTTPS requests will be very
>> >> >> >> different:
>> >> >> >>
>> >> >> >> Page view: Instead of action=mobileview&sections=0 the app would
>> >> >> >> call
>> >> >> >> the
>> >> >> >> RESTBase endpoint for lead request[1] instead of the PHP API
>> >> >> >> mentioned
>> >> >> >> above. Then it would call [2].
>> >> >> >> Link preview: Instead of action=query&prop=extracts it would call
>> >> >> >> the
>> >> >> >> lead
>> >> >> >> request[1], too, since there is a lot of overlap. At least that
>> >> >> >> our
>> >> >> >> current
>> >> >> >> plan. The advantage of that is that the client doesn't need to
>> >> >> >> execute
>> >> >> >> the
>> >> >> >> lead request a second time if the user clicks on the link preview
>> >> >> >> (--
>> >> >> >> either
>> >> >> >> through caching or app logic.)
>> >> >> >>
>> >> >> >> So, in the RESTBase case we either want to count the
>> >> >> >> mobile-html-sections-lead requests or the
>> >> >> >> mobile-html-sections-remaining
>> >> >> >> requests depending on what our definition for page views actually
>> >> >> >> is.
>> >> >> >> We
>> >> >> >> could also add a query parameter or extra HTTP header to one of
>> >> >> >> the
>> >> >> >> mobile-html-sections-lead requests if we need to distinguish
>> >> >> >> between
>> >> >> >> previews and page views.
>> >> >> >>
>> >> >> >> Both the current PHP API and the RESTBase based metrics would
>> >> >> >> need
>> >> >> >> to
>> >> >> >> be
>> >> >> >> compatible and be collected in parallel since we cannot control
>> >> >> >> when
>> >> >> >> users
>> >> >> >> update their apps.
>> >> >> >>
>> >> >> >> [1]
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>> >> >> >> [2]
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert
>> >> >> >> [3]
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
>> >> >> >>
>> >> >> >> [4] https://phabricator.wikimedia.org/T109383
>> >> >> >>
>> >> >> >>
>> >> >> >> Cheers,
>> >> >> >>
>> >> >> >> Bernd
>> >> >> >>
>> >> >> >>
>> >> >> >> _______________________________________________
>> >> >> >> Analytics mailing list
>> >> >> >> [email protected]
>> >> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Oliver Keyes
>> >> >> > Count Logula
>> >> >> > Wikimedia Foundation
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > Analytics mailing list
>> >> >> > [email protected]
>> >> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Analytics mailing list
>> >> >> [email protected]
>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Analytics mailing list
>> >> > [email protected]
>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Oliver Keyes
>> >> Count Logula
>> >> Wikimedia Foundation
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> [email protected]
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to