Oh that makes sense.  page_title always, and page_id if you have it.  I
wonder if there's a way to get the canonical post-redirect page_title in
all cases... hm...

On Wed, Aug 19, 2015 at 12:35 PM, Andrew Otto <[email protected]> wrote:

> I think if we do this right, we should prefer page_id, but use page_title
> if it is provided.
>
> However, at the moment we don’t have a good way of actually getting
> page_title in Hadoop from the MW DBs even if given a page_id.  We’d still
> have to infer the title from the URI.  I’d prefer if page_id was the
> canonical way of identifying a page view, but currently page_title is used
> in all pageview statistics.  Using the page_title as the generator of the
> request sees it might be even more correct than inferring it from the URI.
> Or, maybe it would be better (for the moment) to use use the existence of
> page_id or page_title to indicate to the pageview definition logic that
> this request is definitely already a pageview, and then use the same page
> title from URI logic on all requests no matter what.
>
> page_id or page_title would just allow the pageview definition pattern
> matching logic to be skipped, as we would know right up front that a
> request is a pageview.
>
> Are you saying the apps have the option to skip providing one of
> page_title or page_id?
>
> So uhhh, yes!  I think, although I am not the authority on this.  I defer
> to other analytics engineers who will actually have to implement and
> maintain this change :)
>
>
>
>
> On Aug 19, 2015, at 12:29, Bernd Sitzmann <[email protected]> wrote:
>
> Andrew,
>
> Are you saying the apps have the option to skip providing one of
> page_title or page_id?
> I hope this is the case since I just came up with a scheme where we could
> avoid the second request when a page has only a single section, which we
> already get through the first (lead) request.
>
> Yes to what Oliver said: The apps don't always know the page_id ahead of
> time (only sometimes). The best example where we don't know the page_id
> ahead of time is when someone searches for a term on Google search on an
> Android device, and gets directed to our Android app. The app only gets the
> URL of the page, which we then take to derive the wiki and page_title from.
>
> Bernd
>
> On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes <[email protected]>
> wrote:
>
>> It'll need to be, some requests don't know pageID in advance, which I
>> think was the reason Apps initially didn't implement this.
>>
>> On 19 August 2015 at 12:19, Andrew Otto <[email protected]> wrote:
>> > If your app/site/etc. is creating a request that it wants to count as a
>> > pageview, add an X-Analytics header with pageview_id=<page_id> or
>> > pageview_title=<page_title>
>> >
>> >
>> > page_id is the current key, so let’s keep that.  page_title would be
>> good to
>> > have too.  Let’s make it an and/or.
>> >
>> >
>> > On Aug 19, 2015, at 12:17, Bernd Sitzmann <[email protected]> wrote:
>> >
>> >> If your app/site/etc. is creating a request that it wants to count as a
>> >> pageview, add an X-Analytics header with pageview_id=<page_id> or
>> >> pageview_title=<page_title>
>> >
>> >
>> > Ideally the page id would be the way to go. From a client's perspective
>> I
>> > prefer the page title since clients don't always know the page id ahead
>> of
>> > time. (We could put that header into the second request of loading the
>> page
>> > but I cannot guarantee that we we will always have a second request in
>> the
>> > future.)
>> >
>> > --Cheers,
>> > Bernd
>> >
>> > On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <
>> [email protected]>
>> > wrote:
>> >>
>> >> This (making pageviews proactive) is a great idea, and we should follow
>> >> through.  Here's a simple start:
>> >>
>> >> If your app/site/etc. is creating a request that it wants to count as a
>> >> pageview, add an X-Analytics header with pageview_id=<page_id> or
>> >> pageview_title=<page_title>
>> >>
>> >> If we can make this change uniformly, I think we'd be in a very good
>> >> place.
>> >>
>> >> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <[email protected]>
>> >> wrote:
>> >>>
>> >>> On 19 August 2015 at 10:19, Andrew Otto <[email protected]> wrote:
>> >>> >>  If we /do/ include RESTBase requests we will not only have to
>> >>> >> rewrite the pageview definition for the apps to recognise the new
>> URL
>> >>> >> scheme
>> >>> >
>> >>> > I really think that apps and APIs should do something proactive to
>> tag
>> >>> > or log a pageview.  With more ways of viewing content, it is going
>> to get
>> >>> > harder and harder to maintain a pattern based definition.  A
>> pageview should
>> >>> > be an event that is logged, not something that is pattern matched
>> out of a
>> >>> > very noisy stream of data.
>> >>> >
>> >>> > Most mediawiki requests do this now, via the page_id field in the
>> >>> > X-Analytlics header, but we can’t use this for all pageviews
>> because APIs
>> >>> > are more complicated (e.g. more than one page can be served in a
>> single
>> >>> > request, etc.).  In the longterm, there should be a pageview event
>> stream
>> >>> > just like rcstream! :)
>> >>>
>> >>> This is an excellent point. IIRC we'd been asking Apps to do this for
>> >>> kind of a while, so...
>> >>>
>> >>> >
>> >>> > -Ao
>> >>> >
>> >>> >
>> >>> >
>> >>> >> On Aug 18, 2015, at 19:58, Oliver Keyes <[email protected]>
>> wrote:
>> >>> >>
>> >>> >> On 18 August 2015 at 19:11, Bernd Sitzmann <[email protected]>
>> >>> >> wrote:
>> >>> >>> This discussion is about needed updates of the definition and
>> >>> >>> Analytics
>> >>> >>> implementation for mobile apps page view metrics. There is also an
>> >>> >>> associated Phab task[4]. Please add the proper Analytics project
>> >>> >>> there.
>> >>> >>>
>> >>> >>> Background / Changes
>> >>> >>>
>> >>> >>> As you probably remember, the Android app splits a page view into
>> two
>> >>> >>> requests: one for the lead section and metadata, plus another one
>> for
>> >>> >>> the
>> >>> >>> remainder.
>> >>> >>>
>> >>> >>> The mobile apps are going to change the way they load pages in two
>> >>> >>> different
>> >>> >>> ways:
>> >>> >>>
>> >>> >>> We'll add a link preview when someone clicks on a link from a
>> page.
>> >>> >>> We're planning on switching over the using RESTBase for loading
>> pages
>> >>> >>> and
>> >>> >>> also the link preview (initially just the Android beta, ater more)
>> >>> >>>
>> >>> >>
>> >>> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
>> >>> >> service API?
>> >>> >>
>> >>> >> Last time I checked that wasn't even consumed by HDFS. Is it now
>> being
>> >>> >> consumed by HDFS?
>> >>> >>
>> >>> >> More importantly the actual URLs are going to look /totally/
>> >>> >> different. If we do not include RESTBase requests, we will miss the
>> >>> >> apps. If we /do/ include RESTBase requests we will not only have to
>> >>> >> rewrite the pageview definition for the apps to recognise the new
>> URL
>> >>> >> scheme, we will also potentially have to rewrite every /other/ bit
>> of
>> >>> >> the definition to /not/ incorporate those requests.
>> >>> >>
>> >>> >> (I use "we" in a collective sense. This isn't my baby any more,
>> >>> >> although if Joseph et al want help with the refactor here I'm
>> happy to
>> >>> >> spend my volunteer time on it).
>> >>> >>
>> >>> >> But basically every other bit of your email is important but now
>> >>> >> secondary: this is a potentially massive change, all on its own,
>> even
>> >>> >> without the link preview, even if the substance of the requests
>> going
>> >>> >> to RESTBase were identical.
>> >>> >>
>> >>> >>> This will have implications for the pageviews definition and how
>> we
>> >>> >>> count
>> >>> >>> user engagement.
>> >>> >>>
>> >>> >>> The big question is
>> >>> >>>
>> >>> >>> Should we count link previews as a page view since it's an
>> indication
>> >>> >>> of
>> >>> >>> user engagement? Or should there be a separate metric for link
>> >>> >>> previews?
>> >>> >>>
>> >>> >>> Counting page views
>> >>> >>>
>> >>> >>> IIRC we currently count action=mobileview&sections=0 query
>> parameters
>> >>> >>> of
>> >>> >>> api.php as a page view. When we publish link previews for all
>> Android
>> >>> >>> app
>> >>> >>> users then we would either want to count also the calls to
>> >>> >>> action=query&prop=extracts as a page view or add them to another
>> >>> >>> metric.
>> >>> >>>
>> >>> >>> Once the apps use RESTBase the HTTPS requests will be very
>> different:
>> >>> >>>
>> >>> >>> Page view: Instead of action=mobileview&sections=0 the app would
>> call
>> >>> >>> the
>> >>> >>> RESTBase endpoint for lead request[1] instead of the PHP API
>> >>> >>> mentioned
>> >>> >>> above. Then it would call [2].
>> >>> >>> Link preview: Instead of action=query&prop=extracts it would call
>> the
>> >>> >>> lead
>> >>> >>> request[1], too, since there is a lot of overlap. At least that
>> our
>> >>> >>> current
>> >>> >>> plan. The advantage of that is that the client doesn't need to
>> >>> >>> execute the
>> >>> >>> lead request a second time if the user clicks on the link preview
>> (--
>> >>> >>> either
>> >>> >>> through caching or app logic.)
>> >>> >>>
>> >>> >>> So, in the RESTBase case we either want to count the
>> >>> >>> mobile-html-sections-lead requests or the
>> >>> >>> mobile-html-sections-remaining
>> >>> >>> requests depending on what our definition for page views actually
>> is.
>> >>> >>> We
>> >>> >>> could also add a query parameter or extra HTTP header to one of
>> the
>> >>> >>> mobile-html-sections-lead requests if we need to distinguish
>> between
>> >>> >>> previews and page views.
>> >>> >>>
>> >>> >>> Both the current PHP API and the RESTBase based metrics would
>> need to
>> >>> >>> be
>> >>> >>> compatible and be collected in parallel since we cannot control
>> when
>> >>> >>> users
>> >>> >>> update their apps.
>> >>> >>>
>> >>> >>> [1]
>> >>> >>>
>> >>> >>>
>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>> >>> >>> [2]
>> >>> >>>
>> >>> >>>
>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert
>> >>> >>> [3]
>> >>> >>>
>> >>> >>>
>> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
>> >>> >>>
>> >>> >>> [4] https://phabricator.wikimedia.org/T109383
>> >>> >>>
>> >>> >>>
>> >>> >>> Cheers,
>> >>> >>>
>> >>> >>> Bernd
>> >>> >>>
>> >>> >>>
>> >>> >>> _______________________________________________
>> >>> >>> Analytics mailing list
>> >>> >>> [email protected]
>> >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Oliver Keyes
>> >>> >> Count Logula
>> >>> >> Wikimedia Foundation
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> Analytics mailing list
>> >>> >> [email protected]
>> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > Analytics mailing list
>> >>> > [email protected]
>> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Oliver Keyes
>> >>> Count Logula
>> >>> Wikimedia Foundation
>> >>>
>> >>> _______________________________________________
>> >>> Analytics mailing list
>> >>> [email protected]
>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> [email protected]
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to