Oh that makes sense. page_title always, and page_id if you have it. I wonder if there's a way to get the canonical post-redirect page_title in all cases... hm...
On Wed, Aug 19, 2015 at 12:35 PM, Andrew Otto <[email protected]> wrote: > I think if we do this right, we should prefer page_id, but use page_title > if it is provided. > > However, at the moment we don’t have a good way of actually getting > page_title in Hadoop from the MW DBs even if given a page_id. We’d still > have to infer the title from the URI. I’d prefer if page_id was the > canonical way of identifying a page view, but currently page_title is used > in all pageview statistics. Using the page_title as the generator of the > request sees it might be even more correct than inferring it from the URI. > Or, maybe it would be better (for the moment) to use use the existence of > page_id or page_title to indicate to the pageview definition logic that > this request is definitely already a pageview, and then use the same page > title from URI logic on all requests no matter what. > > page_id or page_title would just allow the pageview definition pattern > matching logic to be skipped, as we would know right up front that a > request is a pageview. > > Are you saying the apps have the option to skip providing one of > page_title or page_id? > > So uhhh, yes! I think, although I am not the authority on this. I defer > to other analytics engineers who will actually have to implement and > maintain this change :) > > > > > On Aug 19, 2015, at 12:29, Bernd Sitzmann <[email protected]> wrote: > > Andrew, > > Are you saying the apps have the option to skip providing one of > page_title or page_id? > I hope this is the case since I just came up with a scheme where we could > avoid the second request when a page has only a single section, which we > already get through the first (lead) request. > > Yes to what Oliver said: The apps don't always know the page_id ahead of > time (only sometimes). The best example where we don't know the page_id > ahead of time is when someone searches for a term on Google search on an > Android device, and gets directed to our Android app. The app only gets the > URL of the page, which we then take to derive the wiki and page_title from. > > Bernd > > On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes <[email protected]> > wrote: > >> It'll need to be, some requests don't know pageID in advance, which I >> think was the reason Apps initially didn't implement this. >> >> On 19 August 2015 at 12:19, Andrew Otto <[email protected]> wrote: >> > If your app/site/etc. is creating a request that it wants to count as a >> > pageview, add an X-Analytics header with pageview_id=<page_id> or >> > pageview_title=<page_title> >> > >> > >> > page_id is the current key, so let’s keep that. page_title would be >> good to >> > have too. Let’s make it an and/or. >> > >> > >> > On Aug 19, 2015, at 12:17, Bernd Sitzmann <[email protected]> wrote: >> > >> >> If your app/site/etc. is creating a request that it wants to count as a >> >> pageview, add an X-Analytics header with pageview_id=<page_id> or >> >> pageview_title=<page_title> >> > >> > >> > Ideally the page id would be the way to go. From a client's perspective >> I >> > prefer the page title since clients don't always know the page id ahead >> of >> > time. (We could put that header into the second request of loading the >> page >> > but I cannot guarantee that we we will always have a second request in >> the >> > future.) >> > >> > --Cheers, >> > Bernd >> > >> > On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu < >> [email protected]> >> > wrote: >> >> >> >> This (making pageviews proactive) is a great idea, and we should follow >> >> through. Here's a simple start: >> >> >> >> If your app/site/etc. is creating a request that it wants to count as a >> >> pageview, add an X-Analytics header with pageview_id=<page_id> or >> >> pageview_title=<page_title> >> >> >> >> If we can make this change uniformly, I think we'd be in a very good >> >> place. >> >> >> >> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <[email protected]> >> >> wrote: >> >>> >> >>> On 19 August 2015 at 10:19, Andrew Otto <[email protected]> wrote: >> >>> >> If we /do/ include RESTBase requests we will not only have to >> >>> >> rewrite the pageview definition for the apps to recognise the new >> URL >> >>> >> scheme >> >>> > >> >>> > I really think that apps and APIs should do something proactive to >> tag >> >>> > or log a pageview. With more ways of viewing content, it is going >> to get >> >>> > harder and harder to maintain a pattern based definition. A >> pageview should >> >>> > be an event that is logged, not something that is pattern matched >> out of a >> >>> > very noisy stream of data. >> >>> > >> >>> > Most mediawiki requests do this now, via the page_id field in the >> >>> > X-Analytlics header, but we can’t use this for all pageviews >> because APIs >> >>> > are more complicated (e.g. more than one page can be served in a >> single >> >>> > request, etc.). In the longterm, there should be a pageview event >> stream >> >>> > just like rcstream! :) >> >>> >> >>> This is an excellent point. IIRC we'd been asking Apps to do this for >> >>> kind of a while, so... >> >>> >> >>> > >> >>> > -Ao >> >>> > >> >>> > >> >>> > >> >>> >> On Aug 18, 2015, at 19:58, Oliver Keyes <[email protected]> >> wrote: >> >>> >> >> >>> >> On 18 August 2015 at 19:11, Bernd Sitzmann <[email protected]> >> >>> >> wrote: >> >>> >>> This discussion is about needed updates of the definition and >> >>> >>> Analytics >> >>> >>> implementation for mobile apps page view metrics. There is also an >> >>> >>> associated Phab task[4]. Please add the proper Analytics project >> >>> >>> there. >> >>> >>> >> >>> >>> Background / Changes >> >>> >>> >> >>> >>> As you probably remember, the Android app splits a page view into >> two >> >>> >>> requests: one for the lead section and metadata, plus another one >> for >> >>> >>> the >> >>> >>> remainder. >> >>> >>> >> >>> >>> The mobile apps are going to change the way they load pages in two >> >>> >>> different >> >>> >>> ways: >> >>> >>> >> >>> >>> We'll add a link preview when someone clicks on a link from a >> page. >> >>> >>> We're planning on switching over the using RESTBase for loading >> pages >> >>> >>> and >> >>> >>> also the link preview (initially just the Android beta, ater more) >> >>> >>> >> >>> >> >> >>> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful >> >>> >> service API? >> >>> >> >> >>> >> Last time I checked that wasn't even consumed by HDFS. Is it now >> being >> >>> >> consumed by HDFS? >> >>> >> >> >>> >> More importantly the actual URLs are going to look /totally/ >> >>> >> different. If we do not include RESTBase requests, we will miss the >> >>> >> apps. If we /do/ include RESTBase requests we will not only have to >> >>> >> rewrite the pageview definition for the apps to recognise the new >> URL >> >>> >> scheme, we will also potentially have to rewrite every /other/ bit >> of >> >>> >> the definition to /not/ incorporate those requests. >> >>> >> >> >>> >> (I use "we" in a collective sense. This isn't my baby any more, >> >>> >> although if Joseph et al want help with the refactor here I'm >> happy to >> >>> >> spend my volunteer time on it). >> >>> >> >> >>> >> But basically every other bit of your email is important but now >> >>> >> secondary: this is a potentially massive change, all on its own, >> even >> >>> >> without the link preview, even if the substance of the requests >> going >> >>> >> to RESTBase were identical. >> >>> >> >> >>> >>> This will have implications for the pageviews definition and how >> we >> >>> >>> count >> >>> >>> user engagement. >> >>> >>> >> >>> >>> The big question is >> >>> >>> >> >>> >>> Should we count link previews as a page view since it's an >> indication >> >>> >>> of >> >>> >>> user engagement? Or should there be a separate metric for link >> >>> >>> previews? >> >>> >>> >> >>> >>> Counting page views >> >>> >>> >> >>> >>> IIRC we currently count action=mobileview§ions=0 query >> parameters >> >>> >>> of >> >>> >>> api.php as a page view. When we publish link previews for all >> Android >> >>> >>> app >> >>> >>> users then we would either want to count also the calls to >> >>> >>> action=query&prop=extracts as a page view or add them to another >> >>> >>> metric. >> >>> >>> >> >>> >>> Once the apps use RESTBase the HTTPS requests will be very >> different: >> >>> >>> >> >>> >>> Page view: Instead of action=mobileview§ions=0 the app would >> call >> >>> >>> the >> >>> >>> RESTBase endpoint for lead request[1] instead of the PHP API >> >>> >>> mentioned >> >>> >>> above. Then it would call [2]. >> >>> >>> Link preview: Instead of action=query&prop=extracts it would call >> the >> >>> >>> lead >> >>> >>> request[1], too, since there is a lot of overlap. At least that >> our >> >>> >>> current >> >>> >>> plan. The advantage of that is that the client doesn't need to >> >>> >>> execute the >> >>> >>> lead request a second time if the user clicks on the link preview >> (-- >> >>> >>> either >> >>> >>> through caching or app logic.) >> >>> >>> >> >>> >>> So, in the RESTBase case we either want to count the >> >>> >>> mobile-html-sections-lead requests or the >> >>> >>> mobile-html-sections-remaining >> >>> >>> requests depending on what our definition for page views actually >> is. >> >>> >>> We >> >>> >>> could also add a query parameter or extra HTTP header to one of >> the >> >>> >>> mobile-html-sections-lead requests if we need to distinguish >> between >> >>> >>> previews and page views. >> >>> >>> >> >>> >>> Both the current PHP API and the RESTBase based metrics would >> need to >> >>> >>> be >> >>> >>> compatible and be collected in parallel since we cannot control >> when >> >>> >>> users >> >>> >>> update their apps. >> >>> >>> >> >>> >>> [1] >> >>> >>> >> >>> >>> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert >> >>> >>> [2] >> >>> >>> >> >>> >>> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert >> >>> >>> [3] >> >>> >>> >> >>> >>> >> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps >> >>> >>> >> >>> >>> [4] https://phabricator.wikimedia.org/T109383 >> >>> >>> >> >>> >>> >> >>> >>> Cheers, >> >>> >>> >> >>> >>> Bernd >> >>> >>> >> >>> >>> >> >>> >>> _______________________________________________ >> >>> >>> Analytics mailing list >> >>> >>> [email protected] >> >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >>> >> >>> >> >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> Oliver Keyes >> >>> >> Count Logula >> >>> >> Wikimedia Foundation >> >>> >> >> >>> >> _______________________________________________ >> >>> >> Analytics mailing list >> >>> >> [email protected] >> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> > >> >>> > >> >>> > _______________________________________________ >> >>> > Analytics mailing list >> >>> > [email protected] >> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >>> >> >>> >> >>> -- >> >>> Oliver Keyes >> >>> Count Logula >> >>> Wikimedia Foundation >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> >> >> >> >> _______________________________________________ >> >> Analytics mailing list >> >> [email protected] >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> > >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
