Just realized that hourly counts won't need it -- because they'll be generated from page views anyway!
On Wed, Jan 7, 2015 at 5:41 PM, Aaron Halfaker <[email protected]> wrote: > That's great and it will serve most of my use cases. Any chance we can > get that field added to the sampled logs & hourly counts? > > On Wed, Jan 7, 2015 at 5:40 PM, Nuria Ruiz <[email protected]> wrote: > >> I am not sure if this is quite what you are asking but just in case: >> >> For streaming is probably easier for you to use the newly created >> webrequest tables: >> >> >> https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table.28s.29 >> >> Those include an isPageview field so requests are pre-classified. You >> will need to wait a bit as data for those tables is being populated >> starting today. >> >> >> >> On Wed, Jan 7, 2015 at 3:35 PM, Aaron Halfaker <[email protected]> >> wrote: >> >>> Cool! Let's say I want to review the filters and apply them in a python >>> script. What should I reference? >>> >>> On Wed, Jan 7, 2015 at 5:13 PM, Oliver Keyes <[email protected]> >>> wrote: >>> >>>> I'm pleased to say we now have the prototype pageviews definition as a >>>> UDF! >>>> >>>> For those with cluster access: >>>> >>>> CREATE TEMPORARY FUNCTION pageview as >>>> 'org.wikimedia.analytics.refinery.hive.isPageviewUDF'; >>>> >>>> ...and then just apply it. It outputs a boolean, so you can easily go >>>> WHERE is.Pageview(fields) and treat it as a conditional. Great >>>> success! >>>> >>>> What this means for the definition is twofold; it means it's a lot >>>> easier to tests it accuracy, and it means that it's a lot easier to >>>> make sure we're all using the same definition going forward. Once we >>>> have the legacy definition as a UDF, refining and testing will proceed >>>> at great speed, although I encourage anyone with time on their hands >>>> who wants to help out to do some testing of their own :) >>>> >>>> -- >>>> Oliver Keyes >>>> Research Analyst >>>> Wikimedia Foundation >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
