Just realized that hourly counts won't need it -- because they'll be
generated from page views anyway!

On Wed, Jan 7, 2015 at 5:41 PM, Aaron Halfaker <[email protected]>
wrote:

> That's great and it will serve most of my use cases.  Any chance we can
> get that field added to the sampled logs & hourly counts?
>
> On Wed, Jan 7, 2015 at 5:40 PM, Nuria Ruiz <[email protected]> wrote:
>
>> I am not sure if this is quite what you are asking but just in case:
>>
>> For streaming is probably easier for you to use the newly created
>> webrequest tables:
>>
>>
>> https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table.28s.29
>>
>> Those include an isPageview field so requests are pre-classified. You
>> will need to wait a bit as data for those tables is being populated
>> starting today.
>>
>>
>>
>> On Wed, Jan 7, 2015 at 3:35 PM, Aaron Halfaker <[email protected]>
>> wrote:
>>
>>> Cool!  Let's say I want to review the filters and apply them in a python
>>> script.  What should I reference?
>>>
>>> On Wed, Jan 7, 2015 at 5:13 PM, Oliver Keyes <[email protected]>
>>> wrote:
>>>
>>>> I'm pleased to say we now have the prototype pageviews definition as a
>>>> UDF!
>>>>
>>>> For those with cluster access:
>>>>
>>>> CREATE TEMPORARY FUNCTION pageview as
>>>> 'org.wikimedia.analytics.refinery.hive.isPageviewUDF';
>>>>
>>>> ...and then just apply it. It outputs a boolean, so you can easily go
>>>> WHERE is.Pageview(fields) and treat it as a conditional. Great
>>>> success!
>>>>
>>>> What this means for the definition is twofold; it means it's a lot
>>>> easier to tests it accuracy, and it means that it's a lot easier to
>>>> make sure we're all using the same definition going forward. Once we
>>>> have the legacy definition as a UDF, refining and testing will proceed
>>>> at great speed, although I encourage anyone with time on their hands
>>>> who wants to help out to do some testing of their own :)
>>>>
>>>> --
>>>> Oliver Keyes
>>>> Research Analyst
>>>> Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to