Awesome!  Maybe better to parse pybal than puppet?

> On Dec 9, 2014, at 20:15, Aaron Halfaker <[email protected]> wrote:
> 
> Hey folks,
> 
> As discussions on the new page view definition have been calming down, we're 
> preparing to deliver a draft version to the Devs.  I want to make sure that 
> we all know the status and that any substantial concerns are raised before we 
> hand things off on Friday, Dec 12th.
> 
> For this phase, we are delivering the general filter[1].  This is the highest 
> level filter, and exists primarily to distinguish requests worthy of further 
> evaluation. Our plan is to take the definition as it exists on the 12th, and 
> begin generating high-level aggregate numbers based on it. In future 
> iterations, we will be digging into different breakdowns of this metric, and 
> iterating on it to handle any inconsistencies or unexpected results.  There's 
> a few differences from Web Stat Collector's (WSC) version of the general 
> filter that we want to call to your attention to.
> We include searches -- WSC explicitly excludes them.
> We include Apps traffic -- WSC does not detect Apps traffic
> We include variants of /wiki/ (e.g. /zh-tw/, /zh-cn/, /sr-ec/) -- WSC 
> hardcodes "/wiki/"
> We don't include Banner impressions -- WSC includes them.
> There are also some known issues with the new definition that are worth your 
> notice:
>     
> Internal traffic is counted
> Note that WSC filters some internal traffic by hardcoding a set of IPs in the 
> definition.  We are working on parsing puppet templates in order to 
> automatically detect which IPs represent internal traffic.  This will be a 
> /better/ solution, but it's not quite ready yet because parsing puppet is 
> hard.  
> Spider traffic is counted
> We will be using the User-agent field to detect and flag spider-based 
> traffic.  This "tag definition" will be delivered in a subsequent definition. 
>  This actually matches WSC, which does not filter spider for the high-level 
> metrics.
> These are problems we're aware of, and will be factoring in as we go forward 
> with our next task: refining the definition using real, hourly-level traffic 
> data. Thanks to everyone who has given feedback and participated in the 
> process thus far, particularly Nemo, Erik, and Christian.
> 
> 1. https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
> 
> -Aaron & Oliver
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to