I think the hive code is "representative" in that it's an implementation. It's 
certainly not the only permitted one. 

> On Dec 15, 2014, at 10:34 AM, Andrew Otto <[email protected]> wrote:
> 
>>  We're moving forward to generate Hive queries that will represent the 
>> formal specification.
> Should a specific implementation (e.g. Hive) represent the formal 
> specification?  I tend to think it should be tech-agnostic, no?
> 
> 
> 
>> On Dec 15, 2014, at 12:15, Aaron Halfaker <[email protected]> wrote:
>> 
>> Toby, that's right.  We're moving forward to generate Hive queries that will 
>> represent the formal specification.  
>> 
>> -Aaron
>> 
>>> On Mon, Dec 15, 2014 at 9:12 AM, Oliver Keyes <[email protected]> wrote:
>>> We've written the draft Hive queries and I'm reviewing them with Otto now. 
>>> Currently blocked on Hadoop heapsize issues, but I'm sure we'll work it 
>>> through :).
>>> 
>>>> On 15 December 2014 at 12:10, Toby Negrin <[email protected]> wrote:
>>>> Hi Aaron, all --
>>>> 
>>>> I haven't seen any discussion on this which is a sign that we can forward 
>>>> with turning over the draft. Thoughts?
>>>> 
>>>> thanks,
>>>> 
>>>> -Toby
>>>> 
>>>>> On Tue, Dec 9, 2014 at 5:15 PM, Aaron Halfaker <[email protected]> 
>>>>> wrote:
>>>>> Hey folks,
>>>>> 
>>>>> As discussions on the new page view definition have been calming down, 
>>>>> we're preparing to deliver a draft version to the Devs.  I want to make 
>>>>> sure that we all know the status and that any substantial concerns are 
>>>>> raised before we hand things off on Friday, Dec 12th.
>>>>> 
>>>>> For this phase, we are delivering the general filter[1].  This is the 
>>>>> highest level filter, and exists primarily to distinguish requests worthy 
>>>>> of further evaluation. Our plan is to take the definition as it exists on 
>>>>> the 12th, and begin generating high-level aggregate numbers based on it. 
>>>>> In future iterations, we will be digging into different breakdowns of 
>>>>> this metric, and iterating on it to handle any inconsistencies or 
>>>>> unexpected results.  There's a few differences from Web Stat Collector's 
>>>>> (WSC) version of the general filter that we want to call to your 
>>>>> attention to.
>>>>> We include searches -- WSC explicitly excludes them.
>>>>> We include Apps traffic -- WSC does not detect Apps traffic
>>>>> We include variants of /wiki/ (e.g. /zh-tw/, /zh-cn/, /sr-ec/) -- WSC 
>>>>> hardcodes "/wiki/"
>>>>> We don't include Banner impressions -- WSC includes them.
>>>>> There are also some known issues with the new definition that are worth 
>>>>> your notice:
>>>>>     
>>>>> Internal traffic is counted
>>>>> Note that WSC filters some internal traffic by hardcoding a set of IPs in 
>>>>> the definition.  We are working on parsing puppet templates in order to 
>>>>> automatically detect which IPs represent internal traffic.  This will be 
>>>>> a /better/ solution, but it's not quite ready yet because parsing puppet 
>>>>> is hard.  
>>>>> Spider traffic is counted
>>>>> We will be using the User-agent field to detect and flag spider-based 
>>>>> traffic.  This "tag definition" will be delivered in a subsequent 
>>>>> definition.  This actually matches WSC, which does not filter spider for 
>>>>> the high-level metrics.
>>>>> These are problems we're aware of, and will be factoring in as we go 
>>>>> forward with our next task: refining the definition using real, 
>>>>> hourly-level traffic data. Thanks to everyone who has given feedback and 
>>>>> participated in the process thus far, particularly Nemo, Erik, and 
>>>>> Christian.
>>>>> 
>>>>> 1. https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
>>>>> 
>>>>> -Aaron & Oliver
>>>>> 
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> 
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> 
>>> 
>>> -- 
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>> 
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> 
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to