We'll see. At the moment we don't actually know what we're /looking/ for ;p. I've started a distinct thread people can hopefully take a look at, covering this issue.
On 9 December 2014 at 21:46, Andrew Otto <[email protected]> wrote: > Awesome! Maybe better to parse pybal than puppet? > > > On Dec 9, 2014, at 20:15, Aaron Halfaker <[email protected]> wrote: > > Hey folks, > > As discussions on the new page view definition have been calming down, > we're preparing to deliver a draft version to the Devs. I want to make > sure that we all know the status and that any substantial concerns are > raised before we hand things off on *Friday, Dec 12th.* > > For this phase, we are delivering the general filter[1]. This is the > highest level filter, and exists primarily to distinguish requests worthy > of further evaluation. Our plan is to take the definition as it exists on > the 12th, and begin generating high-level aggregate numbers based on it. In > future iterations, we will be digging into different breakdowns of this > metric, and iterating on it to handle any inconsistencies or unexpected > results. There's a few differences from Web Stat Collector's (WSC) version > of the general filter that we want to call to your attention to. > > - We include searches -- WSC explicitly excludes them. > - We include Apps traffic -- WSC does not detect Apps traffic > - We include variants of /wiki/ (e.g. /zh-tw/, /zh-cn/, /sr-ec/) -- > WSC hardcodes "/wiki/" > - We don't include Banner impressions -- WSC includes them. > > There are also some known issues with the new definition that are worth > your notice: > > > 1. *Internal traffic is counted* > > > - Note that WSC filters some internal traffic by hardcoding a set of > IPs in the definition. We are working on parsing puppet templates in order > to automatically detect which IPs represent internal traffic. This will be > a /better/ solution, but it's not quite ready yet because parsing puppet is > hard. > > > 1. *Spider traffic is counted* > > > - We will be using the User-agent field to detect and flag > spider-based traffic. This "tag definition" will be delivered in a > subsequent definition. This actually matches WSC, which does not filter > spider for the high-level metrics. > > These are problems we're aware of, and will be factoring in as we go > forward with our next task: refining the definition using real, > hourly-level traffic data. Thanks to everyone who has given feedback and > participated in the process thus far, particularly Nemo, Erik, and > Christian. > > 1. https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters > > -Aaron & Oliver > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
