Awesome! Maybe better to parse pybal than puppet?
> On Dec 9, 2014, at 20:15, Aaron Halfaker <[email protected]> wrote: > > Hey folks, > > As discussions on the new page view definition have been calming down, we're > preparing to deliver a draft version to the Devs. I want to make sure that > we all know the status and that any substantial concerns are raised before we > hand things off on Friday, Dec 12th. > > For this phase, we are delivering the general filter[1]. This is the highest > level filter, and exists primarily to distinguish requests worthy of further > evaluation. Our plan is to take the definition as it exists on the 12th, and > begin generating high-level aggregate numbers based on it. In future > iterations, we will be digging into different breakdowns of this metric, and > iterating on it to handle any inconsistencies or unexpected results. There's > a few differences from Web Stat Collector's (WSC) version of the general > filter that we want to call to your attention to. > We include searches -- WSC explicitly excludes them. > We include Apps traffic -- WSC does not detect Apps traffic > We include variants of /wiki/ (e.g. /zh-tw/, /zh-cn/, /sr-ec/) -- WSC > hardcodes "/wiki/" > We don't include Banner impressions -- WSC includes them. > There are also some known issues with the new definition that are worth your > notice: > > Internal traffic is counted > Note that WSC filters some internal traffic by hardcoding a set of IPs in the > definition. We are working on parsing puppet templates in order to > automatically detect which IPs represent internal traffic. This will be a > /better/ solution, but it's not quite ready yet because parsing puppet is > hard. > Spider traffic is counted > We will be using the User-agent field to detect and flag spider-based > traffic. This "tag definition" will be delivered in a subsequent definition. > This actually matches WSC, which does not filter spider for the high-level > metrics. > These are problems we're aware of, and will be factoring in as we go forward > with our next task: refining the definition using real, hourly-level traffic > data. Thanks to everyone who has given feedback and participated in the > process thus far, particularly Nemo, Erik, and Christian. > > 1. https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters > > -Aaron & Oliver > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
