Hi Tilman, Your assumption is correct, you can trust projectview_hourly :) On Wed, Mar 2, 2016 at 4:22 AM, Tilman Bayer <[email protected]> wrote:
> Thanks Joseph! Is it reasonable to assume that the aggregate data in > projectview_hourly > <https://wikitech.wikimedia.org/wiki/Analytics/Data/Projectview_hourly> has > not been affected? > > On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou < > [email protected]> wrote: > >> Hey Oliver, >> It depends on what data you've used: if page_title or other 'encoding >> sensitive' data (I can't think of any other, but ...) is part of it, then >> yes, you should ! >> >> On Tue, Mar 1, 2016 at 3:27 PM, Oliver Keyes <[email protected]> >> wrote: >> >>> Hey Joseph, >>> >>> Thanks for letting us know. So we should delete and backfill last >>> week's data, for our regularly scheduled scripts? >>> >>> On 1 March 2016 at 08:26, Joseph Allemandou <[email protected]> >>> wrote: >>> > Hi, >>> > >>> > TL,DR: Please don't use hive / spark / hadoop before next week. >>> > >>> > Last week the Analytics Team performed an upgrade to the Hadoop >>> Cluster. >>> > It went reasonably well except for many of the hadoop processes were >>> > launched with a special option to NOT use utf-8 as default encoding. >>> > This issue caused trouble particularly in page title extraction and was >>> > detected last sunday (many kudos to the people having filled bugs on >>> > Analytics API about encoding :) >>> > We found the bug and fixed it yesterday, and backfill starts today, >>> with the >>> > cluster recomputing every dataset starting 2016-02-23 onward. >>> > This means you shouldn't query last week data during this week, first >>> > because it is incorrect, and second because you'll curse the cluster >>> for >>> > being too slow :) >>> > >>> > We are sorry for the inconvenience. >>> > Don't hesitate to contact us if you have any question >>> > >>> > >>> > -- >>> > Joseph Allemandou >>> > Data Engineer @ Wikimedia Foundation >>> > IRC: joal >>> > >>> > _______________________________________________ >>> > Engineering mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/engineering >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Count Logula >>> Wikimedia Foundation >>> >> >> >> >> -- >> *Joseph Allemandou* >> Data Engineer @ Wikimedia Foundation >> IRC: joal >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > > -- > Tilman Bayer > Senior Analyst > Wikimedia Foundation > IRC (Freenode): HaeB > > _______________________________________________ > Engineering mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/engineering > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
