Look at the record_version field to know if the new column is populated.
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#Changes_and_known_problems_since_2015-03-04

On Sun, Apr 12, 2015 at 10:43 AM, Toby Negrin <[email protected]> wrote:

> Hi Yuri --
>
> In general, I do not think this table will change a lot moving forward.
> We're migrating to a more complete definition right now so some changes are
> to be expected but things should settle down.
>
> Thanks for the new fields!
>
> -Toby
>
> On Sun, Apr 12, 2015 at 9:55 AM, Andrew Otto <[email protected]> wrote:
>
>> You probably have to do it conditionally by date
>>
>>
>> On Apr 12, 2015, at 12:38, Yuri Astrakhan <[email protected]>
>> wrote:
>>
>> Thanks Oliver! Is there a way to handle it in hql? E.g if(
>> exists(is_pageview),is_pageview,null)?  Finding out if field exists by
>> observing query crash seems wrong ))
>> On Apr 12, 2015 06:53, "Oliver Keyes" <[email protected]> wrote:
>>
>>> (Duplicated from bug):
>>>
>>> That's not a bug. The complexity of regenerating ~60 days of data,
>>> where a day is 24*60*125000 rows, is extreme, and adding new fields
>>> means doing just that - regenerating the entire thing. As such, the
>>> decision was made to add to the field definition and only add actual
>>> values going forward from the point at which the patch was merged.
>>> This was true of the is_pageview calculation, the user agent data and
>>> the geolocation elements previously added, and is still true now.
>>>
>>> On 11 April 2015 at 03:33, Yuri Astrakhan <[email protected]>
>>> wrote:
>>> > I tried to move Zero analytics to the new table, and decided to test
>>> the new
>>> > wonderful fields like agent_type ... and it only works on the most
>>> recent
>>> > hours of data ((
>>> >
>>> > https://phabricator.wikimedia.org/T95806
>>> >
>>> >
>>> > On Fri, Apr 10, 2015 at 8:51 PM, Yuri Astrakhan <
>>> [email protected]>
>>> > wrote:
>>> >>
>>> >> Please clarify why the field "is_zero" is needed, as it is nothing
>>> more
>>> >> than a test for ("zero=" in x_analytics). Does having this field
>>> >> significantly improve performance for zero queries, e.g. "select
>>> count(*)
>>> >> from requests where iszero = true" ? Because otherwise it simply
>>> identifies
>>> >> "zero partner" traffic, not "was that request actually zero rated or
>>> not".
>>> >>
>>> >> Thanks!
>>> >>
>>> >> On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> Cool!
>>> >>>
>>> >>> On 10 April 2015 at 17:12, Joseph Allemandou <
>>> [email protected]>
>>> >>> wrote:
>>> >>> > Yes Oliver, the agent_type = spider includes IsCrawler UDF.
>>> >>> >
>>> >>> > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <
>>> [email protected]>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> What does agent-type add? In the sense that if we're pre-parsing
>>> the
>>> >>> >> user agent, surely the difference is between "WHERE agent_type !=
>>> >>> >> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"?
>>> >>> >> Does agent_type include the isCrawler UDF results?
>>> >>> >>
>>> >>> >> On 10 April 2015 at 16:47, Joseph Allemandou
>>> >>> >> <[email protected]>
>>> >>> >> wrote:
>>> >>> >> > And I forgot one field :
>>> >>> >> >
>>> >>> >> > is_zero - True if a request is made on a zero provider.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <
>>> [email protected]>
>>> >>> >> > wrote:
>>> >>> >> >>
>>> >>> >> >> Hi Joseph,
>>> >>> >> >>
>>> >>> >> >>    Thanks for the update, and for doing this. These three items
>>> >>> >> >> make
>>> >>> >> >> the
>>> >>> >> >> analysis of the data much easier on our end. We've had many
>>> >>> >> >> requests in
>>> >>> >> >> the
>>> >>> >> >> past that required agent_type and access_method information and
>>> >>> >> >> having
>>> >>> >> >> them
>>> >>> >> >> readily available is awesome! :-)
>>> >>> >> >>
>>> >>> >> >> Have a great weekend!
>>> >>> >> >>
>>> >>> >> >> Leila
>>> >>> >> >>
>>> >>> >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou
>>> >>> >> >> <[email protected]> wrote:
>>> >>> >> >>>
>>> >>> >> >>> Hi Analytics people,
>>> >>> >> >>>
>>> >>> >> >>> Today happens another bunch of addition to the refined
>>> webrequest
>>> >>> >> >>> table
>>> >>> >> >>> in hive.
>>> >>> >> >>> Now the table contains:
>>> >>> >> >>>
>>> >>> >> >>> ts - The unix timestamp (milliseconds) version of the dt date
>>> >>> >> >>> access_method - The method used to access the site, being one
>>> of
>>> >>> >> >>> the
>>> >>> >> >>> three [mobile app | mobile web | desktop]
>>> >>> >> >>> agent_type - To differentiate easily between spiders and users
>>> >>> >> >>> (more
>>> >>> >> >>> values may be added later).
>>> >>> >> >>>
>>> >>> >> >>> These additions are based on the "tags", as defined here:
>>> >>> >> >>> https://meta.wikimedia.org/wiki/Research:Page_view
>>> >>> >> >>>
>>> >>> >> >>> Have a good weekend !
>>> >>> >> >>>
>>> >>> >> >>> --
>>> >>> >> >>> Joseph Allemandou
>>> >>> >> >>> Data Engineer @ Wikimedia Foundation
>>> >>> >> >>> IRC: joal
>>> >>> >> >>>
>>> >>> >> >>> _______________________________________________
>>> >>> >> >>> Analytics mailing list
>>> >>> >> >>> [email protected]
>>> >>> >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>> >> >>>
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> _______________________________________________
>>> >>> >> >> Analytics mailing list
>>> >>> >> >> [email protected]
>>> >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>> >> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > --
>>> >>> >> > Joseph Allemandou
>>> >>> >> > Data Engineer @ Wikimedia Foundation
>>> >>> >> > IRC: joal
>>> >>> >> >
>>> >>> >> > _______________________________________________
>>> >>> >> > Analytics mailing list
>>> >>> >> > [email protected]
>>> >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Oliver Keyes
>>> >>> >> Research Analyst
>>> >>> >> Wikimedia Foundation
>>> >>> >>
>>> >>> >> _______________________________________________
>>> >>> >> Analytics mailing list
>>> >>> >> [email protected]
>>> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Joseph Allemandou
>>> >>> > Data Engineer @ Wikimedia Foundation
>>> >>> > IRC: joal
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > Analytics mailing list
>>> >>> > [email protected]
>>> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Oliver Keyes
>>> >>> Research Analyst
>>> >>> Wikimedia Foundation
>>> >>>
>>> >>> _______________________________________________
>>> >>> Analytics mailing list
>>> >>> [email protected]
>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > Analytics mailing list
>>> > [email protected]
>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >
>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to