I am not sure if this is quite what you are asking but just in case: For streaming is probably easier for you to use the newly created webrequest tables:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table.28s.29 Those include an isPageview field so requests are pre-classified. You will need to wait a bit as data for those tables is being populated starting today. On Wed, Jan 7, 2015 at 3:35 PM, Aaron Halfaker <[email protected]> wrote: > Cool! Let's say I want to review the filters and apply them in a python > script. What should I reference? > > On Wed, Jan 7, 2015 at 5:13 PM, Oliver Keyes <[email protected]> wrote: > >> I'm pleased to say we now have the prototype pageviews definition as a >> UDF! >> >> For those with cluster access: >> >> CREATE TEMPORARY FUNCTION pageview as >> 'org.wikimedia.analytics.refinery.hive.isPageviewUDF'; >> >> ...and then just apply it. It outputs a boolean, so you can easily go >> WHERE is.Pageview(fields) and treat it as a conditional. Great >> success! >> >> What this means for the definition is twofold; it means it's a lot >> easier to tests it accuracy, and it means that it's a lot easier to >> make sure we're all using the same definition going forward. Once we >> have the legacy definition as a UDF, refining and testing will proceed >> at great speed, although I encourage anyone with time on their hands >> who wants to help out to do some testing of their own :) >> >> -- >> Oliver Keyes >> Research Analyst >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
