I'm pleased to say we now have the prototype pageviews definition as a UDF!
For those with cluster access: CREATE TEMPORARY FUNCTION pageview as 'org.wikimedia.analytics.refinery.hive.isPageviewUDF'; ...and then just apply it. It outputs a boolean, so you can easily go WHERE is.Pageview(fields) and treat it as a conditional. Great success! What this means for the definition is twofold; it means it's a lot easier to tests it accuracy, and it means that it's a lot easier to make sure we're all using the same definition going forward. Once we have the legacy definition as a UDF, refining and testing will proceed at great speed, although I encourage anyone with time on their hands who wants to help out to do some testing of their own :) -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
