You probably have to do it conditionally by date
> On Apr 12, 2015, at 12:38, Yuri Astrakhan <[email protected]> wrote: > > Thanks Oliver! Is there a way to handle it in hql? E.g if( > exists(is_pageview),is_pageview,null)? Finding out if field exists by > observing query crash seems wrong )) > >> On Apr 12, 2015 06:53, "Oliver Keyes" <[email protected]> wrote: >> (Duplicated from bug): >> >> That's not a bug. The complexity of regenerating ~60 days of data, >> where a day is 24*60*125000 rows, is extreme, and adding new fields >> means doing just that - regenerating the entire thing. As such, the >> decision was made to add to the field definition and only add actual >> values going forward from the point at which the patch was merged. >> This was true of the is_pageview calculation, the user agent data and >> the geolocation elements previously added, and is still true now. >> >> On 11 April 2015 at 03:33, Yuri Astrakhan <[email protected]> wrote: >> > I tried to move Zero analytics to the new table, and decided to test the >> > new >> > wonderful fields like agent_type ... and it only works on the most recent >> > hours of data (( >> > >> > https://phabricator.wikimedia.org/T95806 >> > >> > >> > On Fri, Apr 10, 2015 at 8:51 PM, Yuri Astrakhan <[email protected]> >> > wrote: >> >> >> >> Please clarify why the field "is_zero" is needed, as it is nothing more >> >> than a test for ("zero=" in x_analytics). Does having this field >> >> significantly improve performance for zero queries, e.g. "select count(*) >> >> from requests where iszero = true" ? Because otherwise it simply >> >> identifies >> >> "zero partner" traffic, not "was that request actually zero rated or not". >> >> >> >> Thanks! >> >> >> >> On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <[email protected]> >> >> wrote: >> >>> >> >>> Cool! >> >>> >> >>> On 10 April 2015 at 17:12, Joseph Allemandou <[email protected]> >> >>> wrote: >> >>> > Yes Oliver, the agent_type = spider includes IsCrawler UDF. >> >>> > >> >>> > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <[email protected]> >> >>> > wrote: >> >>> >> >> >>> >> What does agent-type add? In the sense that if we're pre-parsing the >> >>> >> user agent, surely the difference is between "WHERE agent_type != >> >>> >> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"? >> >>> >> Does agent_type include the isCrawler UDF results? >> >>> >> >> >>> >> On 10 April 2015 at 16:47, Joseph Allemandou >> >>> >> <[email protected]> >> >>> >> wrote: >> >>> >> > And I forgot one field : >> >>> >> > >> >>> >> > is_zero - True if a request is made on a zero provider. >> >>> >> > >> >>> >> > >> >>> >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <[email protected]> >> >>> >> > wrote: >> >>> >> >> >> >>> >> >> Hi Joseph, >> >>> >> >> >> >>> >> >> Thanks for the update, and for doing this. These three items >> >>> >> >> make >> >>> >> >> the >> >>> >> >> analysis of the data much easier on our end. We've had many >> >>> >> >> requests in >> >>> >> >> the >> >>> >> >> past that required agent_type and access_method information and >> >>> >> >> having >> >>> >> >> them >> >>> >> >> readily available is awesome! :-) >> >>> >> >> >> >>> >> >> Have a great weekend! >> >>> >> >> >> >>> >> >> Leila >> >>> >> >> >> >>> >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou >> >>> >> >> <[email protected]> wrote: >> >>> >> >>> >> >>> >> >>> Hi Analytics people, >> >>> >> >>> >> >>> >> >>> Today happens another bunch of addition to the refined webrequest >> >>> >> >>> table >> >>> >> >>> in hive. >> >>> >> >>> Now the table contains: >> >>> >> >>> >> >>> >> >>> ts - The unix timestamp (milliseconds) version of the dt date >> >>> >> >>> access_method - The method used to access the site, being one of >> >>> >> >>> the >> >>> >> >>> three [mobile app | mobile web | desktop] >> >>> >> >>> agent_type - To differentiate easily between spiders and users >> >>> >> >>> (more >> >>> >> >>> values may be added later). >> >>> >> >>> >> >>> >> >>> These additions are based on the "tags", as defined here: >> >>> >> >>> https://meta.wikimedia.org/wiki/Research:Page_view >> >>> >> >>> >> >>> >> >>> Have a good weekend ! >> >>> >> >>> >> >>> >> >>> -- >> >>> >> >>> Joseph Allemandou >> >>> >> >>> Data Engineer @ Wikimedia Foundation >> >>> >> >>> IRC: joal >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> >> >>> Analytics mailing list >> >>> >> >>> [email protected] >> >>> >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >>> >> >>> >> >> >> >>> >> >> >> >>> >> >> _______________________________________________ >> >>> >> >> Analytics mailing list >> >>> >> >> [email protected] >> >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >> >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > -- >> >>> >> > Joseph Allemandou >> >>> >> > Data Engineer @ Wikimedia Foundation >> >>> >> > IRC: joal >> >>> >> > >> >>> >> > _______________________________________________ >> >>> >> > Analytics mailing list >> >>> >> > [email protected] >> >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> > >> >>> >> >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> Oliver Keyes >> >>> >> Research Analyst >> >>> >> Wikimedia Foundation >> >>> >> >> >>> >> _______________________________________________ >> >>> >> Analytics mailing list >> >>> >> [email protected] >> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > Joseph Allemandou >> >>> > Data Engineer @ Wikimedia Foundation >> >>> > IRC: joal >> >>> > >> >>> > _______________________________________________ >> >>> > Analytics mailing list >> >>> > [email protected] >> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Oliver Keyes >> >>> Research Analyst >> >>> Wikimedia Foundation >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> >> > >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> >> >> -- >> Oliver Keyes >> Research Analyst >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
