It's totally tech-agnostic; the neutral definition is on meta. The hive
query is just because, since we suspect that's how we'll be generating the
data, it makes sense to turn the draft def into HQL for exploratory queries
and testing.

...of course, now that I've said that, cosmic irony demands we end up
implementing in C, or something.

On 15 December 2014 at 13:46, Toby Negrin <[email protected]> wrote:
>
> I think the hive code is "representative" in that it's an implementation.
> It's certainly not the only permitted one.
>
> On Dec 15, 2014, at 10:34 AM, Andrew Otto <[email protected]> wrote:
>
>  We're moving forward to generate Hive queries that will represent the
> formal specification.
>
> Should a specific implementation (e.g. Hive) represent the formal
> specification?  I tend to think it should be tech-agnostic, no?
>
>
>
> On Dec 15, 2014, at 12:15, Aaron Halfaker <[email protected]> wrote:
>
> Toby, that's right.  We're moving forward to generate Hive queries that
> will represent the formal specification.
>
> -Aaron
>
> On Mon, Dec 15, 2014 at 9:12 AM, Oliver Keyes <[email protected]>
> wrote:
>
>> We've written the draft Hive queries and I'm reviewing them with Otto
>> now. Currently blocked on Hadoop heapsize issues, but I'm sure we'll work
>> it through :).
>>
>> On 15 December 2014 at 12:10, Toby Negrin <[email protected]> wrote:
>>>
>>> Hi Aaron, all --
>>>
>>> I haven't seen any discussion on this which is a sign that we can
>>> forward with turning over the draft. Thoughts?
>>>
>>> thanks,
>>>
>>> -Toby
>>>
>>> On Tue, Dec 9, 2014 at 5:15 PM, Aaron Halfaker <[email protected]>
>>> wrote:
>>>
>>>> Hey folks,
>>>>
>>>> As discussions on the new page view definition have been calming down,
>>>> we're preparing to deliver a draft version to the Devs.  I want to make
>>>> sure that we all know the status and that any substantial concerns are
>>>> raised before we hand things off on *Friday, Dec 12th.*
>>>>
>>>> For this phase, we are delivering the general filter[1].  This is the
>>>> highest level filter, and exists primarily to distinguish requests worthy
>>>> of further evaluation. Our plan is to take the definition as it exists on
>>>> the 12th, and begin generating high-level aggregate numbers based on it. In
>>>> future iterations, we will be digging into different breakdowns of this
>>>> metric, and iterating on it to handle any inconsistencies or unexpected
>>>> results.  There's a few differences from Web Stat Collector's (WSC) version
>>>> of the general filter that we want to call to your attention to.
>>>>
>>>>    - We include searches -- WSC explicitly excludes them.
>>>>    - We include Apps traffic -- WSC does not detect Apps traffic
>>>>    - We include variants of /wiki/ (e.g. /zh-tw/, /zh-cn/, /sr-ec/) --
>>>>    WSC hardcodes "/wiki/"
>>>>    - We don't include Banner impressions -- WSC includes them.
>>>>
>>>> There are also some known issues with the new definition that are worth
>>>> your notice:
>>>>
>>>>
>>>>    1. *Internal traffic is counted*
>>>>
>>>>
>>>>    - Note that WSC filters some internal traffic by hardcoding a set
>>>>    of IPs in the definition.  We are working on parsing puppet templates in
>>>>    order to automatically detect which IPs represent internal traffic.  
>>>> This
>>>>    will be a /better/ solution, but it's not quite ready yet because 
>>>> parsing
>>>>    puppet is hard.
>>>>
>>>>
>>>>    1. *Spider traffic is counted*
>>>>
>>>>
>>>>    - We will be using the User-agent field to detect and flag
>>>>    spider-based traffic.  This "tag definition" will be delivered in a
>>>>    subsequent definition.  This actually matches WSC, which does not filter
>>>>    spider for the high-level metrics.
>>>>
>>>> These are problems we're aware of, and will be factoring in as we go
>>>> forward with our next task: refining the definition using real,
>>>> hourly-level traffic data. Thanks to everyone who has given feedback and
>>>> participated in the process thus far, particularly Nemo, Erik, and
>>>> Christian.
>>>>
>>>> 1.
>>>> https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
>>>>
>>>> -Aaron & Oliver
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>

-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to