Hey Oliver, It depends on what data you've used: if page_title or other 'encoding sensitive' data (I can't think of any other, but ...) is part of it, then yes, you should !
On Tue, Mar 1, 2016 at 3:27 PM, Oliver Keyes <[email protected]> wrote: > Hey Joseph, > > Thanks for letting us know. So we should delete and backfill last > week's data, for our regularly scheduled scripts? > > On 1 March 2016 at 08:26, Joseph Allemandou <[email protected]> > wrote: > > Hi, > > > > TL,DR: Please don't use hive / spark / hadoop before next week. > > > > Last week the Analytics Team performed an upgrade to the Hadoop Cluster. > > It went reasonably well except for many of the hadoop processes were > > launched with a special option to NOT use utf-8 as default encoding. > > This issue caused trouble particularly in page title extraction and was > > detected last sunday (many kudos to the people having filled bugs on > > Analytics API about encoding :) > > We found the bug and fixed it yesterday, and backfill starts today, with > the > > cluster recomputing every dataset starting 2016-02-23 onward. > > This means you shouldn't query last week data during this week, first > > because it is incorrect, and second because you'll curse the cluster for > > being too slow :) > > > > We are sorry for the inconvenience. > > Don't hesitate to contact us if you have any question > > > > > > -- > > Joseph Allemandou > > Data Engineer @ Wikimedia Foundation > > IRC: joal > > > > _______________________________________________ > > Engineering mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/engineering > > > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
