Aren't we currently just storing pageID? On 19 August 2015 at 14:11, Dan Andreescu <[email protected]> wrote: > Oliver, the problem with "page_title OR page_id" instead of "always > page_title and page_id if you have it" is what Andrew was addressing above. > It means we have to query for page_title by id, and that means we need to > keep an up-to-date copy of all mediawiki databases. And we have to be able > to query that copy tens of thousands of times per second, which is basically > not going to happen. > > We just chatted in scrum of scrums about this, it looks like Adam's going to > set up a meeting so we can talk more there. I agree with Adam that we have > to have a short term solution for counting the new kinds of requests. A > medium term solution so that we don't all go insane, and something to shoot > for in the long term. > > On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes <[email protected]> wrote: >> >> In the absence of all clients doing it, "if it has this x_analytics >> entry, don't bother with the complex regular expressions, if it >> doesn't, do" still works. >> >> On 19 August 2015 at 13:34, Gabriel Wicke <[email protected]> wrote: >> > Yeah, doing this on the client could work, but would require *all* >> > clients >> > to actually do it. We also have metrics per entry point in RESTBase, but >> > those are behind Varnishes and will only count Varnish cache misses. >> > Without >> > Varnish caching, this would be a solved problem ;) >> > >> > On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu >> > <[email protected]> >> > wrote: >> >> >> >> This (making pageviews proactive) is a great idea, and we should follow >> >> through. Here's a simple start: >> >> >> >> If your app/site/etc. is creating a request that it wants to count as a >> >> pageview, add an X-Analytics header with pageview_id=<page_id> or >> >> pageview_title=<page_title> >> >> >> >> If we can make this change uniformly, I think we'd be in a very good >> >> place. >> >> >> >> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <[email protected]> >> >> wrote: >> >>> >> >>> On 19 August 2015 at 10:19, Andrew Otto <[email protected]> wrote: >> >>> >> If we /do/ include RESTBase requests we will not only have to >> >>> >> rewrite the pageview definition for the apps to recognise the new >> >>> >> URL >> >>> >> scheme >> >>> > >> >>> > I really think that apps and APIs should do something proactive to >> >>> > tag >> >>> > or log a pageview. With more ways of viewing content, it is going >> >>> > to get >> >>> > harder and harder to maintain a pattern based definition. A >> >>> > pageview should >> >>> > be an event that is logged, not something that is pattern matched >> >>> > out of a >> >>> > very noisy stream of data. >> >>> > >> >>> > Most mediawiki requests do this now, via the page_id field in the >> >>> > X-Analytlics header, but we can’t use this for all pageviews because >> >>> > APIs >> >>> > are more complicated (e.g. more than one page can be served in a >> >>> > single >> >>> > request, etc.). In the longterm, there should be a pageview event >> >>> > stream >> >>> > just like rcstream! :) >> >>> >> >>> This is an excellent point. IIRC we'd been asking Apps to do this for >> >>> kind of a while, so... >> >>> >> >>> > >> >>> > -Ao >> >>> > >> >>> > >> >>> > >> >>> >> On Aug 18, 2015, at 19:58, Oliver Keyes <[email protected]> >> >>> >> wrote: >> >>> >> >> >>> >> On 18 August 2015 at 19:11, Bernd Sitzmann <[email protected]> >> >>> >> wrote: >> >>> >>> This discussion is about needed updates of the definition and >> >>> >>> Analytics >> >>> >>> implementation for mobile apps page view metrics. There is also an >> >>> >>> associated Phab task[4]. Please add the proper Analytics project >> >>> >>> there. >> >>> >>> >> >>> >>> Background / Changes >> >>> >>> >> >>> >>> As you probably remember, the Android app splits a page view into >> >>> >>> two >> >>> >>> requests: one for the lead section and metadata, plus another one >> >>> >>> for >> >>> >>> the >> >>> >>> remainder. >> >>> >>> >> >>> >>> The mobile apps are going to change the way they load pages in two >> >>> >>> different >> >>> >>> ways: >> >>> >>> >> >>> >>> We'll add a link preview when someone clicks on a link from a >> >>> >>> page. >> >>> >>> We're planning on switching over the using RESTBase for loading >> >>> >>> pages >> >>> >>> and >> >>> >>> also the link preview (initially just the Android beta, ater more) >> >>> >>> >> >>> >> >> >>> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful >> >>> >> service API? >> >>> >> >> >>> >> Last time I checked that wasn't even consumed by HDFS. Is it now >> >>> >> being >> >>> >> consumed by HDFS? >> >>> >> >> >>> >> More importantly the actual URLs are going to look /totally/ >> >>> >> different. If we do not include RESTBase requests, we will miss the >> >>> >> apps. If we /do/ include RESTBase requests we will not only have to >> >>> >> rewrite the pageview definition for the apps to recognise the new >> >>> >> URL >> >>> >> scheme, we will also potentially have to rewrite every /other/ bit >> >>> >> of >> >>> >> the definition to /not/ incorporate those requests. >> >>> >> >> >>> >> (I use "we" in a collective sense. This isn't my baby any more, >> >>> >> although if Joseph et al want help with the refactor here I'm happy >> >>> >> to >> >>> >> spend my volunteer time on it). >> >>> >> >> >>> >> But basically every other bit of your email is important but now >> >>> >> secondary: this is a potentially massive change, all on its own, >> >>> >> even >> >>> >> without the link preview, even if the substance of the requests >> >>> >> going >> >>> >> to RESTBase were identical. >> >>> >> >> >>> >>> This will have implications for the pageviews definition and how >> >>> >>> we >> >>> >>> count >> >>> >>> user engagement. >> >>> >>> >> >>> >>> The big question is >> >>> >>> >> >>> >>> Should we count link previews as a page view since it's an >> >>> >>> indication >> >>> >>> of >> >>> >>> user engagement? Or should there be a separate metric for link >> >>> >>> previews? >> >>> >>> >> >>> >>> Counting page views >> >>> >>> >> >>> >>> IIRC we currently count action=mobileview§ions=0 query >> >>> >>> parameters >> >>> >>> of >> >>> >>> api.php as a page view. When we publish link previews for all >> >>> >>> Android >> >>> >>> app >> >>> >>> users then we would either want to count also the calls to >> >>> >>> action=query&prop=extracts as a page view or add them to another >> >>> >>> metric. >> >>> >>> >> >>> >>> Once the apps use RESTBase the HTTPS requests will be very >> >>> >>> different: >> >>> >>> >> >>> >>> Page view: Instead of action=mobileview§ions=0 the app would >> >>> >>> call >> >>> >>> the >> >>> >>> RESTBase endpoint for lead request[1] instead of the PHP API >> >>> >>> mentioned >> >>> >>> above. Then it would call [2]. >> >>> >>> Link preview: Instead of action=query&prop=extracts it would call >> >>> >>> the >> >>> >>> lead >> >>> >>> request[1], too, since there is a lot of overlap. At least that >> >>> >>> our >> >>> >>> current >> >>> >>> plan. The advantage of that is that the client doesn't need to >> >>> >>> execute the >> >>> >>> lead request a second time if the user clicks on the link preview >> >>> >>> (-- >> >>> >>> either >> >>> >>> through caching or app logic.) >> >>> >>> >> >>> >>> So, in the RESTBase case we either want to count the >> >>> >>> mobile-html-sections-lead requests or the >> >>> >>> mobile-html-sections-remaining >> >>> >>> requests depending on what our definition for page views actually >> >>> >>> is. >> >>> >>> We >> >>> >>> could also add a query parameter or extra HTTP header to one of >> >>> >>> the >> >>> >>> mobile-html-sections-lead requests if we need to distinguish >> >>> >>> between >> >>> >>> previews and page views. >> >>> >>> >> >>> >>> Both the current PHP API and the RESTBase based metrics would need >> >>> >>> to >> >>> >>> be >> >>> >>> compatible and be collected in parallel since we cannot control >> >>> >>> when >> >>> >>> users >> >>> >>> update their apps. >> >>> >>> >> >>> >>> [1] >> >>> >>> >> >>> >>> >> >>> >>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert >> >>> >>> [2] >> >>> >>> >> >>> >>> >> >>> >>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert >> >>> >>> [3] >> >>> >>> >> >>> >>> >> >>> >>> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps >> >>> >>> >> >>> >>> [4] https://phabricator.wikimedia.org/T109383 >> >>> >>> >> >>> >>> >> >>> >>> Cheers, >> >>> >>> >> >>> >>> Bernd >> >>> >>> >> >>> >>> >> >>> >>> _______________________________________________ >> >>> >>> Analytics mailing list >> >>> >>> [email protected] >> >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >>> >> >>> >> >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> Oliver Keyes >> >>> >> Count Logula >> >>> >> Wikimedia Foundation >> >>> >> >> >>> >> _______________________________________________ >> >>> >> Analytics mailing list >> >>> >> [email protected] >> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> > >> >>> > >> >>> > _______________________________________________ >> >>> > Analytics mailing list >> >>> > [email protected] >> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >>> >> >>> >> >>> -- >> >>> Oliver Keyes >> >>> Count Logula >> >>> Wikimedia Foundation >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> >> >> >> >> _______________________________________________ >> >> Analytics mailing list >> >> [email protected] >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> > >> > >> > >> > -- >> > Gabriel Wicke >> > Principal Engineer, Wikimedia Foundation >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
