Look at the record_version field to know if the new column is populated. https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#Changes_and_known_problems_since_2015-03-04
On Sun, Apr 12, 2015 at 10:43 AM, Toby Negrin <[email protected]> wrote: > Hi Yuri -- > > In general, I do not think this table will change a lot moving forward. > We're migrating to a more complete definition right now so some changes are > to be expected but things should settle down. > > Thanks for the new fields! > > -Toby > > On Sun, Apr 12, 2015 at 9:55 AM, Andrew Otto <[email protected]> wrote: > >> You probably have to do it conditionally by date >> >> >> On Apr 12, 2015, at 12:38, Yuri Astrakhan <[email protected]> >> wrote: >> >> Thanks Oliver! Is there a way to handle it in hql? E.g if( >> exists(is_pageview),is_pageview,null)? Finding out if field exists by >> observing query crash seems wrong )) >> On Apr 12, 2015 06:53, "Oliver Keyes" <[email protected]> wrote: >> >>> (Duplicated from bug): >>> >>> That's not a bug. The complexity of regenerating ~60 days of data, >>> where a day is 24*60*125000 rows, is extreme, and adding new fields >>> means doing just that - regenerating the entire thing. As such, the >>> decision was made to add to the field definition and only add actual >>> values going forward from the point at which the patch was merged. >>> This was true of the is_pageview calculation, the user agent data and >>> the geolocation elements previously added, and is still true now. >>> >>> On 11 April 2015 at 03:33, Yuri Astrakhan <[email protected]> >>> wrote: >>> > I tried to move Zero analytics to the new table, and decided to test >>> the new >>> > wonderful fields like agent_type ... and it only works on the most >>> recent >>> > hours of data (( >>> > >>> > https://phabricator.wikimedia.org/T95806 >>> > >>> > >>> > On Fri, Apr 10, 2015 at 8:51 PM, Yuri Astrakhan < >>> [email protected]> >>> > wrote: >>> >> >>> >> Please clarify why the field "is_zero" is needed, as it is nothing >>> more >>> >> than a test for ("zero=" in x_analytics). Does having this field >>> >> significantly improve performance for zero queries, e.g. "select >>> count(*) >>> >> from requests where iszero = true" ? Because otherwise it simply >>> identifies >>> >> "zero partner" traffic, not "was that request actually zero rated or >>> not". >>> >> >>> >> Thanks! >>> >> >>> >> On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <[email protected]> >>> >> wrote: >>> >>> >>> >>> Cool! >>> >>> >>> >>> On 10 April 2015 at 17:12, Joseph Allemandou < >>> [email protected]> >>> >>> wrote: >>> >>> > Yes Oliver, the agent_type = spider includes IsCrawler UDF. >>> >>> > >>> >>> > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes < >>> [email protected]> >>> >>> > wrote: >>> >>> >> >>> >>> >> What does agent-type add? In the sense that if we're pre-parsing >>> the >>> >>> >> user agent, surely the difference is between "WHERE agent_type != >>> >>> >> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"? >>> >>> >> Does agent_type include the isCrawler UDF results? >>> >>> >> >>> >>> >> On 10 April 2015 at 16:47, Joseph Allemandou >>> >>> >> <[email protected]> >>> >>> >> wrote: >>> >>> >> > And I forgot one field : >>> >>> >> > >>> >>> >> > is_zero - True if a request is made on a zero provider. >>> >>> >> > >>> >>> >> > >>> >>> >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia < >>> [email protected]> >>> >>> >> > wrote: >>> >>> >> >> >>> >>> >> >> Hi Joseph, >>> >>> >> >> >>> >>> >> >> Thanks for the update, and for doing this. These three items >>> >>> >> >> make >>> >>> >> >> the >>> >>> >> >> analysis of the data much easier on our end. We've had many >>> >>> >> >> requests in >>> >>> >> >> the >>> >>> >> >> past that required agent_type and access_method information and >>> >>> >> >> having >>> >>> >> >> them >>> >>> >> >> readily available is awesome! :-) >>> >>> >> >> >>> >>> >> >> Have a great weekend! >>> >>> >> >> >>> >>> >> >> Leila >>> >>> >> >> >>> >>> >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou >>> >>> >> >> <[email protected]> wrote: >>> >>> >> >>> >>> >>> >> >>> Hi Analytics people, >>> >>> >> >>> >>> >>> >> >>> Today happens another bunch of addition to the refined >>> webrequest >>> >>> >> >>> table >>> >>> >> >>> in hive. >>> >>> >> >>> Now the table contains: >>> >>> >> >>> >>> >>> >> >>> ts - The unix timestamp (milliseconds) version of the dt date >>> >>> >> >>> access_method - The method used to access the site, being one >>> of >>> >>> >> >>> the >>> >>> >> >>> three [mobile app | mobile web | desktop] >>> >>> >> >>> agent_type - To differentiate easily between spiders and users >>> >>> >> >>> (more >>> >>> >> >>> values may be added later). >>> >>> >> >>> >>> >>> >> >>> These additions are based on the "tags", as defined here: >>> >>> >> >>> https://meta.wikimedia.org/wiki/Research:Page_view >>> >>> >> >>> >>> >>> >> >>> Have a good weekend ! >>> >>> >> >>> >>> >>> >> >>> -- >>> >>> >> >>> Joseph Allemandou >>> >>> >> >>> Data Engineer @ Wikimedia Foundation >>> >>> >> >>> IRC: joal >>> >>> >> >>> >>> >>> >> >>> _______________________________________________ >>> >>> >> >>> Analytics mailing list >>> >>> >> >>> [email protected] >>> >>> >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >>> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> _______________________________________________ >>> >>> >> >> Analytics mailing list >>> >>> >> >> [email protected] >>> >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > -- >>> >>> >> > Joseph Allemandou >>> >>> >> > Data Engineer @ Wikimedia Foundation >>> >>> >> > IRC: joal >>> >>> >> > >>> >>> >> > _______________________________________________ >>> >>> >> > Analytics mailing list >>> >>> >> > [email protected] >>> >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> > >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> -- >>> >>> >> Oliver Keyes >>> >>> >> Research Analyst >>> >>> >> Wikimedia Foundation >>> >>> >> >>> >>> >> _______________________________________________ >>> >>> >> Analytics mailing list >>> >>> >> [email protected] >>> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > Joseph Allemandou >>> >>> > Data Engineer @ Wikimedia Foundation >>> >>> > IRC: joal >>> >>> > >>> >>> > _______________________________________________ >>> >>> > Analytics mailing list >>> >>> > [email protected] >>> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> > >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Oliver Keyes >>> >>> Research Analyst >>> >>> Wikimedia Foundation >>> >>> >>> >>> _______________________________________________ >>> >>> Analytics mailing list >>> >>> [email protected] >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >>> >> >>> > >>> > >>> > _______________________________________________ >>> > Analytics mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
