Please clarify why the field "is_zero" is needed, as it is nothing more
than a test for ("zero=" in x_analytics). Does having this field
significantly improve performance for zero queries, e.g. "select count(*)
from requests where iszero = true" ? Because otherwise it simply identifies
"zero partner" traffic, not "was that request actually zero rated or not".Thanks! On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <[email protected]> wrote: > Cool! > > On 10 April 2015 at 17:12, Joseph Allemandou <[email protected]> > wrote: > > Yes Oliver, the agent_type = spider includes IsCrawler UDF. > > > > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <[email protected]> > wrote: > >> > >> What does agent-type add? In the sense that if we're pre-parsing the > >> user agent, surely the difference is between "WHERE agent_type != > >> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"? > >> Does agent_type include the isCrawler UDF results? > >> > >> On 10 April 2015 at 16:47, Joseph Allemandou <[email protected] > > > >> wrote: > >> > And I forgot one field : > >> > > >> > is_zero - True if a request is made on a zero provider. > >> > > >> > > >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <[email protected]> > wrote: > >> >> > >> >> Hi Joseph, > >> >> > >> >> Thanks for the update, and for doing this. These three items make > >> >> the > >> >> analysis of the data much easier on our end. We've had many requests > in > >> >> the > >> >> past that required agent_type and access_method information and > having > >> >> them > >> >> readily available is awesome! :-) > >> >> > >> >> Have a great weekend! > >> >> > >> >> Leila > >> >> > >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou > >> >> <[email protected]> wrote: > >> >>> > >> >>> Hi Analytics people, > >> >>> > >> >>> Today happens another bunch of addition to the refined webrequest > >> >>> table > >> >>> in hive. > >> >>> Now the table contains: > >> >>> > >> >>> ts - The unix timestamp (milliseconds) version of the dt date > >> >>> access_method - The method used to access the site, being one of the > >> >>> three [mobile app | mobile web | desktop] > >> >>> agent_type - To differentiate easily between spiders and users (more > >> >>> values may be added later). > >> >>> > >> >>> These additions are based on the "tags", as defined here: > >> >>> https://meta.wikimedia.org/wiki/Research:Page_view > >> >>> > >> >>> Have a good weekend ! > >> >>> > >> >>> -- > >> >>> Joseph Allemandou > >> >>> Data Engineer @ Wikimedia Foundation > >> >>> IRC: joal > >> >>> > >> >>> _______________________________________________ > >> >>> Analytics mailing list > >> >>> [email protected] > >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >> >>> > >> >> > >> >> > >> >> _______________________________________________ > >> >> Analytics mailing list > >> >> [email protected] > >> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> >> > >> > > >> > > >> > > >> > -- > >> > Joseph Allemandou > >> > Data Engineer @ Wikimedia Foundation > >> > IRC: joal > >> > > >> > _______________________________________________ > >> > Analytics mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > >> > >> > >> > >> -- > >> Oliver Keyes > >> Research Analyst > >> Wikimedia Foundation > >> > >> _______________________________________________ > >> Analytics mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > > > -- > > Joseph Allemandou > > Data Engineer @ Wikimedia Foundation > > IRC: joal > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
