Thanks Oliver! Is there a way to handle it in hql? E.g if( exists(is_pageview),is_pageview,null)? Finding out if field exists by observing query crash seems wrong )) On Apr 12, 2015 06:53, "Oliver Keyes" <[email protected]> wrote:
> (Duplicated from bug): > > That's not a bug. The complexity of regenerating ~60 days of data, > where a day is 24*60*125000 rows, is extreme, and adding new fields > means doing just that - regenerating the entire thing. As such, the > decision was made to add to the field definition and only add actual > values going forward from the point at which the patch was merged. > This was true of the is_pageview calculation, the user agent data and > the geolocation elements previously added, and is still true now. > > On 11 April 2015 at 03:33, Yuri Astrakhan <[email protected]> > wrote: > > I tried to move Zero analytics to the new table, and decided to test the > new > > wonderful fields like agent_type ... and it only works on the most recent > > hours of data (( > > > > https://phabricator.wikimedia.org/T95806 > > > > > > On Fri, Apr 10, 2015 at 8:51 PM, Yuri Astrakhan < > [email protected]> > > wrote: > >> > >> Please clarify why the field "is_zero" is needed, as it is nothing more > >> than a test for ("zero=" in x_analytics). Does having this field > >> significantly improve performance for zero queries, e.g. "select > count(*) > >> from requests where iszero = true" ? Because otherwise it simply > identifies > >> "zero partner" traffic, not "was that request actually zero rated or > not". > >> > >> Thanks! > >> > >> On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <[email protected]> > >> wrote: > >>> > >>> Cool! > >>> > >>> On 10 April 2015 at 17:12, Joseph Allemandou < > [email protected]> > >>> wrote: > >>> > Yes Oliver, the agent_type = spider includes IsCrawler UDF. > >>> > > >>> > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <[email protected] > > > >>> > wrote: > >>> >> > >>> >> What does agent-type add? In the sense that if we're pre-parsing the > >>> >> user agent, surely the difference is between "WHERE agent_type != > >>> >> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"? > >>> >> Does agent_type include the isCrawler UDF results? > >>> >> > >>> >> On 10 April 2015 at 16:47, Joseph Allemandou > >>> >> <[email protected]> > >>> >> wrote: > >>> >> > And I forgot one field : > >>> >> > > >>> >> > is_zero - True if a request is made on a zero provider. > >>> >> > > >>> >> > > >>> >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <[email protected]> > >>> >> > wrote: > >>> >> >> > >>> >> >> Hi Joseph, > >>> >> >> > >>> >> >> Thanks for the update, and for doing this. These three items > >>> >> >> make > >>> >> >> the > >>> >> >> analysis of the data much easier on our end. We've had many > >>> >> >> requests in > >>> >> >> the > >>> >> >> past that required agent_type and access_method information and > >>> >> >> having > >>> >> >> them > >>> >> >> readily available is awesome! :-) > >>> >> >> > >>> >> >> Have a great weekend! > >>> >> >> > >>> >> >> Leila > >>> >> >> > >>> >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou > >>> >> >> <[email protected]> wrote: > >>> >> >>> > >>> >> >>> Hi Analytics people, > >>> >> >>> > >>> >> >>> Today happens another bunch of addition to the refined > webrequest > >>> >> >>> table > >>> >> >>> in hive. > >>> >> >>> Now the table contains: > >>> >> >>> > >>> >> >>> ts - The unix timestamp (milliseconds) version of the dt date > >>> >> >>> access_method - The method used to access the site, being one of > >>> >> >>> the > >>> >> >>> three [mobile app | mobile web | desktop] > >>> >> >>> agent_type - To differentiate easily between spiders and users > >>> >> >>> (more > >>> >> >>> values may be added later). > >>> >> >>> > >>> >> >>> These additions are based on the "tags", as defined here: > >>> >> >>> https://meta.wikimedia.org/wiki/Research:Page_view > >>> >> >>> > >>> >> >>> Have a good weekend ! > >>> >> >>> > >>> >> >>> -- > >>> >> >>> Joseph Allemandou > >>> >> >>> Data Engineer @ Wikimedia Foundation > >>> >> >>> IRC: joal > >>> >> >>> > >>> >> >>> _______________________________________________ > >>> >> >>> Analytics mailing list > >>> >> >>> [email protected] > >>> >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> >> >>> > >>> >> >> > >>> >> >> > >>> >> >> _______________________________________________ > >>> >> >> Analytics mailing list > >>> >> >> [email protected] > >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> >> >> > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Joseph Allemandou > >>> >> > Data Engineer @ Wikimedia Foundation > >>> >> > IRC: joal > >>> >> > > >>> >> > _______________________________________________ > >>> >> > Analytics mailing list > >>> >> > [email protected] > >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Oliver Keyes > >>> >> Research Analyst > >>> >> Wikimedia Foundation > >>> >> > >>> >> _______________________________________________ > >>> >> Analytics mailing list > >>> >> [email protected] > >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> > > >>> > > >>> > > >>> > > >>> > -- > >>> > Joseph Allemandou > >>> > Data Engineer @ Wikimedia Foundation > >>> > IRC: joal > >>> > > >>> > _______________________________________________ > >>> > Analytics mailing list > >>> > [email protected] > >>> > https://lists.wikimedia.org/mailman/listinfo/analytics > >>> > > >>> > >>> > >>> > >>> -- > >>> Oliver Keyes > >>> Research Analyst > >>> Wikimedia Foundation > >>> > >>> _______________________________________________ > >>> Analytics mailing list > >>> [email protected] > >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > >> > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
