I am not sure if this is quite what you are asking but just in case:

For streaming is probably easier for you to use the newly created
webrequest tables:

https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table.28s.29

Those include an isPageview field so requests are pre-classified. You will
need to wait a bit as data for those tables is being populated starting
today.



On Wed, Jan 7, 2015 at 3:35 PM, Aaron Halfaker <[email protected]>
wrote:

> Cool!  Let's say I want to review the filters and apply them in a python
> script.  What should I reference?
>
> On Wed, Jan 7, 2015 at 5:13 PM, Oliver Keyes <[email protected]> wrote:
>
>> I'm pleased to say we now have the prototype pageviews definition as a
>> UDF!
>>
>> For those with cluster access:
>>
>> CREATE TEMPORARY FUNCTION pageview as
>> 'org.wikimedia.analytics.refinery.hive.isPageviewUDF';
>>
>> ...and then just apply it. It outputs a boolean, so you can easily go
>> WHERE is.Pageview(fields) and treat it as a conditional. Great
>> success!
>>
>> What this means for the definition is twofold; it means it's a lot
>> easier to tests it accuracy, and it means that it's a lot easier to
>> make sure we're all using the same definition going forward. Once we
>> have the legacy definition as a UDF, refining and testing will proceed
>> at great speed, although I encourage anyone with time on their hands
>> who wants to help out to do some testing of their own :)
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to