Thanks Joseph! Is it reasonable to assume that the aggregate data in projectview_hourly <https://wikitech.wikimedia.org/wiki/Analytics/Data/Projectview_hourly> has not been affected?
On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou <[email protected] > wrote: > Hey Oliver, > It depends on what data you've used: if page_title or other 'encoding > sensitive' data (I can't think of any other, but ...) is part of it, then > yes, you should ! > > On Tue, Mar 1, 2016 at 3:27 PM, Oliver Keyes <[email protected]> wrote: > >> Hey Joseph, >> >> Thanks for letting us know. So we should delete and backfill last >> week's data, for our regularly scheduled scripts? >> >> On 1 March 2016 at 08:26, Joseph Allemandou <[email protected]> >> wrote: >> > Hi, >> > >> > TL,DR: Please don't use hive / spark / hadoop before next week. >> > >> > Last week the Analytics Team performed an upgrade to the Hadoop Cluster. >> > It went reasonably well except for many of the hadoop processes were >> > launched with a special option to NOT use utf-8 as default encoding. >> > This issue caused trouble particularly in page title extraction and was >> > detected last sunday (many kudos to the people having filled bugs on >> > Analytics API about encoding :) >> > We found the bug and fixed it yesterday, and backfill starts today, >> with the >> > cluster recomputing every dataset starting 2016-02-23 onward. >> > This means you shouldn't query last week data during this week, first >> > because it is incorrect, and second because you'll curse the cluster for >> > being too slow :) >> > >> > We are sorry for the inconvenience. >> > Don't hesitate to contact us if you have any question >> > >> > >> > -- >> > Joseph Allemandou >> > Data Engineer @ Wikimedia Foundation >> > IRC: joal >> > >> > _______________________________________________ >> > Engineering mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/engineering >> > >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> > > > > -- > *Joseph Allemandou* > Data Engineer @ Wikimedia Foundation > IRC: joal > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
