https://phabricator.wikimedia.org/T128295
On Tue, Mar 1, 2016 at 2:15 PM, Bo Han <[email protected]> wrote: > Hi, > > Would you mind linking the bug fix here? I couldn't find it on phabricator. > > Thanks, > Bo > > On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou > <[email protected]> wrote: > > Hey Oliver, > > It depends on what data you've used: if page_title or other 'encoding > > sensitive' data (I can't think of any other, but ...) is part of it, then > > yes, you should ! > > > > On Tue, Mar 1, 2016 at 3:27 PM, Oliver Keyes <[email protected]> > wrote: > >> > >> Hey Joseph, > >> > >> Thanks for letting us know. So we should delete and backfill last > >> week's data, for our regularly scheduled scripts? > >> > >> On 1 March 2016 at 08:26, Joseph Allemandou <[email protected]> > >> wrote: > >> > Hi, > >> > > >> > TL,DR: Please don't use hive / spark / hadoop before next week. > >> > > >> > Last week the Analytics Team performed an upgrade to the Hadoop > Cluster. > >> > It went reasonably well except for many of the hadoop processes were > >> > launched with a special option to NOT use utf-8 as default encoding. > >> > This issue caused trouble particularly in page title extraction and > was > >> > detected last sunday (many kudos to the people having filled bugs on > >> > Analytics API about encoding :) > >> > We found the bug and fixed it yesterday, and backfill starts today, > with > >> > the > >> > cluster recomputing every dataset starting 2016-02-23 onward. > >> > This means you shouldn't query last week data during this week, first > >> > because it is incorrect, and second because you'll curse the cluster > for > >> > being too slow :) > >> > > >> > We are sorry for the inconvenience. > >> > Don't hesitate to contact us if you have any question > >> > > >> > > >> > -- > >> > Joseph Allemandou > >> > Data Engineer @ Wikimedia Foundation > >> > IRC: joal > >> > > >> > _______________________________________________ > >> > Engineering mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/engineering > >> > > >> > >> > >> > >> -- > >> Oliver Keyes > >> Count Logula > >> Wikimedia Foundation > > > > > > > > > > -- > > Joseph Allemandou > > Data Engineer @ Wikimedia Foundation > > IRC: joal > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
