Please clarify why the field "is_zero" is needed, as it is nothing more
than a test for ("zero=" in x_analytics). Does having this field
significantly improve performance for zero queries, e.g. "select count(*)
from requests where iszero = true" ? Because otherwise it simply identifies
"zero partner" traffic, not "was that request actually zero rated or not".

Thanks!

On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <[email protected]> wrote:

> Cool!
>
> On 10 April 2015 at 17:12, Joseph Allemandou <[email protected]>
> wrote:
> > Yes Oliver, the agent_type = spider includes IsCrawler UDF.
> >
> > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <[email protected]>
> wrote:
> >>
> >> What does agent-type add? In the sense that if we're pre-parsing the
> >> user agent, surely the difference is between "WHERE agent_type !=
> >> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"?
> >> Does agent_type include the isCrawler UDF results?
> >>
> >> On 10 April 2015 at 16:47, Joseph Allemandou <[email protected]
> >
> >> wrote:
> >> > And I forgot one field :
> >> >
> >> > is_zero - True if a request is made on a zero provider.
> >> >
> >> >
> >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <[email protected]>
> wrote:
> >> >>
> >> >> Hi Joseph,
> >> >>
> >> >>    Thanks for the update, and for doing this. These three items make
> >> >> the
> >> >> analysis of the data much easier on our end. We've had many requests
> in
> >> >> the
> >> >> past that required agent_type and access_method information and
> having
> >> >> them
> >> >> readily available is awesome! :-)
> >> >>
> >> >> Have a great weekend!
> >> >>
> >> >> Leila
> >> >>
> >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou
> >> >> <[email protected]> wrote:
> >> >>>
> >> >>> Hi Analytics people,
> >> >>>
> >> >>> Today happens another bunch of addition to the refined webrequest
> >> >>> table
> >> >>> in hive.
> >> >>> Now the table contains:
> >> >>>
> >> >>> ts - The unix timestamp (milliseconds) version of the dt date
> >> >>> access_method - The method used to access the site, being one of the
> >> >>> three [mobile app | mobile web | desktop]
> >> >>> agent_type - To differentiate easily between spiders and users (more
> >> >>> values may be added later).
> >> >>>
> >> >>> These additions are based on the "tags", as defined here:
> >> >>> https://meta.wikimedia.org/wiki/Research:Page_view
> >> >>>
> >> >>> Have a good weekend !
> >> >>>
> >> >>> --
> >> >>> Joseph Allemandou
> >> >>> Data Engineer @ Wikimedia Foundation
> >> >>> IRC: joal
> >> >>>
> >> >>> _______________________________________________
> >> >>> Analytics mailing list
> >> >>> [email protected]
> >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Analytics mailing list
> >> >> [email protected]
> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Joseph Allemandou
> >> > Data Engineer @ Wikimedia Foundation
> >> > IRC: joal
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > [email protected]
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Research Analyst
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> >
> > --
> > Joseph Allemandou
> > Data Engineer @ Wikimedia Foundation
> > IRC: joal
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to