In some cases we have page_id, in other cases we have nothing (like API requests)
On Wed, Aug 19, 2015 at 2:13 PM, Oliver Keyes <[email protected]> wrote: > Aren't we currently just storing pageID? > > On 19 August 2015 at 14:11, Dan Andreescu <[email protected]> > wrote: > > Oliver, the problem with "page_title OR page_id" instead of "always > > page_title and page_id if you have it" is what Andrew was addressing > above. > > It means we have to query for page_title by id, and that means we need to > > keep an up-to-date copy of all mediawiki databases. And we have to be > able > > to query that copy tens of thousands of times per second, which is > basically > > not going to happen. > > > > We just chatted in scrum of scrums about this, it looks like Adam's > going to > > set up a meeting so we can talk more there. I agree with Adam that we > have > > to have a short term solution for counting the new kinds of requests. A > > medium term solution so that we don't all go insane, and something to > shoot > > for in the long term. > > > > On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes <[email protected]> > wrote: > >> > >> In the absence of all clients doing it, "if it has this x_analytics > >> entry, don't bother with the complex regular expressions, if it > >> doesn't, do" still works. > >> > >> On 19 August 2015 at 13:34, Gabriel Wicke <[email protected]> wrote: > >> > Yeah, doing this on the client could work, but would require *all* > >> > clients > >> > to actually do it. We also have metrics per entry point in RESTBase, > but > >> > those are behind Varnishes and will only count Varnish cache misses. > >> > Without > >> > Varnish caching, this would be a solved problem ;) > >> > > >> > On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu > >> > <[email protected]> > >> > wrote: > >> >> > >> >> This (making pageviews proactive) is a great idea, and we should > follow > >> >> through. Here's a simple start: > >> >> > >> >> If your app/site/etc. is creating a request that it wants to count > as a > >> >> pageview, add an X-Analytics header with pageview_id=<page_id> or > >> >> pageview_title=<page_title> > >> >> > >> >> If we can make this change uniformly, I think we'd be in a very good > >> >> place. > >> >> > >> >> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <[email protected] > > > >> >> wrote: > >> >>> > >> >>> On 19 August 2015 at 10:19, Andrew Otto <[email protected]> > wrote: > >> >>> >> If we /do/ include RESTBase requests we will not only have to > >> >>> >> rewrite the pageview definition for the apps to recognise the new > >> >>> >> URL > >> >>> >> scheme > >> >>> > > >> >>> > I really think that apps and APIs should do something proactive to > >> >>> > tag > >> >>> > or log a pageview. With more ways of viewing content, it is going > >> >>> > to get > >> >>> > harder and harder to maintain a pattern based definition. A > >> >>> > pageview should > >> >>> > be an event that is logged, not something that is pattern matched > >> >>> > out of a > >> >>> > very noisy stream of data. > >> >>> > > >> >>> > Most mediawiki requests do this now, via the page_id field in the > >> >>> > X-Analytlics header, but we can’t use this for all pageviews > because > >> >>> > APIs > >> >>> > are more complicated (e.g. more than one page can be served in a > >> >>> > single > >> >>> > request, etc.). In the longterm, there should be a pageview event > >> >>> > stream > >> >>> > just like rcstream! :) > >> >>> > >> >>> This is an excellent point. IIRC we'd been asking Apps to do this > for > >> >>> kind of a while, so... > >> >>> > >> >>> > > >> >>> > -Ao > >> >>> > > >> >>> > > >> >>> > > >> >>> >> On Aug 18, 2015, at 19:58, Oliver Keyes <[email protected]> > >> >>> >> wrote: > >> >>> >> > >> >>> >> On 18 August 2015 at 19:11, Bernd Sitzmann <[email protected]> > >> >>> >> wrote: > >> >>> >>> This discussion is about needed updates of the definition and > >> >>> >>> Analytics > >> >>> >>> implementation for mobile apps page view metrics. There is also > an > >> >>> >>> associated Phab task[4]. Please add the proper Analytics project > >> >>> >>> there. > >> >>> >>> > >> >>> >>> Background / Changes > >> >>> >>> > >> >>> >>> As you probably remember, the Android app splits a page view > into > >> >>> >>> two > >> >>> >>> requests: one for the lead section and metadata, plus another > one > >> >>> >>> for > >> >>> >>> the > >> >>> >>> remainder. > >> >>> >>> > >> >>> >>> The mobile apps are going to change the way they load pages in > two > >> >>> >>> different > >> >>> >>> ways: > >> >>> >>> > >> >>> >>> We'll add a link preview when someone clicks on a link from a > >> >>> >>> page. > >> >>> >>> We're planning on switching over the using RESTBase for loading > >> >>> >>> pages > >> >>> >>> and > >> >>> >>> also the link preview (initially just the Android beta, ater > more) > >> >>> >>> > >> >>> >> > >> >>> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's > RESTful > >> >>> >> service API? > >> >>> >> > >> >>> >> Last time I checked that wasn't even consumed by HDFS. Is it now > >> >>> >> being > >> >>> >> consumed by HDFS? > >> >>> >> > >> >>> >> More importantly the actual URLs are going to look /totally/ > >> >>> >> different. If we do not include RESTBase requests, we will miss > the > >> >>> >> apps. If we /do/ include RESTBase requests we will not only have > to > >> >>> >> rewrite the pageview definition for the apps to recognise the new > >> >>> >> URL > >> >>> >> scheme, we will also potentially have to rewrite every /other/ > bit > >> >>> >> of > >> >>> >> the definition to /not/ incorporate those requests. > >> >>> >> > >> >>> >> (I use "we" in a collective sense. This isn't my baby any more, > >> >>> >> although if Joseph et al want help with the refactor here I'm > happy > >> >>> >> to > >> >>> >> spend my volunteer time on it). > >> >>> >> > >> >>> >> But basically every other bit of your email is important but now > >> >>> >> secondary: this is a potentially massive change, all on its own, > >> >>> >> even > >> >>> >> without the link preview, even if the substance of the requests > >> >>> >> going > >> >>> >> to RESTBase were identical. > >> >>> >> > >> >>> >>> This will have implications for the pageviews definition and how > >> >>> >>> we > >> >>> >>> count > >> >>> >>> user engagement. > >> >>> >>> > >> >>> >>> The big question is > >> >>> >>> > >> >>> >>> Should we count link previews as a page view since it's an > >> >>> >>> indication > >> >>> >>> of > >> >>> >>> user engagement? Or should there be a separate metric for link > >> >>> >>> previews? > >> >>> >>> > >> >>> >>> Counting page views > >> >>> >>> > >> >>> >>> IIRC we currently count action=mobileview§ions=0 query > >> >>> >>> parameters > >> >>> >>> of > >> >>> >>> api.php as a page view. When we publish link previews for all > >> >>> >>> Android > >> >>> >>> app > >> >>> >>> users then we would either want to count also the calls to > >> >>> >>> action=query&prop=extracts as a page view or add them to another > >> >>> >>> metric. > >> >>> >>> > >> >>> >>> Once the apps use RESTBase the HTTPS requests will be very > >> >>> >>> different: > >> >>> >>> > >> >>> >>> Page view: Instead of action=mobileview§ions=0 the app would > >> >>> >>> call > >> >>> >>> the > >> >>> >>> RESTBase endpoint for lead request[1] instead of the PHP API > >> >>> >>> mentioned > >> >>> >>> above. Then it would call [2]. > >> >>> >>> Link preview: Instead of action=query&prop=extracts it would > call > >> >>> >>> the > >> >>> >>> lead > >> >>> >>> request[1], too, since there is a lot of overlap. At least that > >> >>> >>> our > >> >>> >>> current > >> >>> >>> plan. The advantage of that is that the client doesn't need to > >> >>> >>> execute the > >> >>> >>> lead request a second time if the user clicks on the link > preview > >> >>> >>> (-- > >> >>> >>> either > >> >>> >>> through caching or app logic.) > >> >>> >>> > >> >>> >>> So, in the RESTBase case we either want to count the > >> >>> >>> mobile-html-sections-lead requests or the > >> >>> >>> mobile-html-sections-remaining > >> >>> >>> requests depending on what our definition for page views > actually > >> >>> >>> is. > >> >>> >>> We > >> >>> >>> could also add a query parameter or extra HTTP header to one of > >> >>> >>> the > >> >>> >>> mobile-html-sections-lead requests if we need to distinguish > >> >>> >>> between > >> >>> >>> previews and page views. > >> >>> >>> > >> >>> >>> Both the current PHP API and the RESTBase based metrics would > need > >> >>> >>> to > >> >>> >>> be > >> >>> >>> compatible and be collected in parallel since we cannot control > >> >>> >>> when > >> >>> >>> users > >> >>> >>> update their apps. > >> >>> >>> > >> >>> >>> [1] > >> >>> >>> > >> >>> >>> > >> >>> >>> > https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert > >> >>> >>> [2] > >> >>> >>> > >> >>> >>> > >> >>> >>> > https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert > >> >>> >>> [3] > >> >>> >>> > >> >>> >>> > >> >>> >>> > https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps > >> >>> >>> > >> >>> >>> [4] https://phabricator.wikimedia.org/T109383 > >> >>> >>> > >> >>> >>> > >> >>> >>> Cheers, > >> >>> >>> > >> >>> >>> Bernd > >> >>> >>> > >> >>> >>> > >> >>> >>> _______________________________________________ > >> >>> >>> Analytics mailing list > >> >>> >>> [email protected] > >> >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >> >>> >>> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> -- > >> >>> >> Oliver Keyes > >> >>> >> Count Logula > >> >>> >> Wikimedia Foundation > >> >>> >> > >> >>> >> _______________________________________________ > >> >>> >> Analytics mailing list > >> >>> >> [email protected] > >> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> >>> > > >> >>> > > >> >>> > _______________________________________________ > >> >>> > Analytics mailing list > >> >>> > [email protected] > >> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Oliver Keyes > >> >>> Count Logula > >> >>> Wikimedia Foundation > >> >>> > >> >>> _______________________________________________ > >> >>> Analytics mailing list > >> >>> [email protected] > >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> Analytics mailing list > >> >> [email protected] > >> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> >> > >> > > >> > > >> > > >> > -- > >> > Gabriel Wicke > >> > Principal Engineer, Wikimedia Foundation > >> > > >> > _______________________________________________ > >> > Analytics mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > >> > >> > >> > >> -- > >> Oliver Keyes > >> Count Logula > >> Wikimedia Foundation > >> > >> _______________________________________________ > >> Analytics mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
