I've been trying to fix this data all week! Thought I had, but I hadn't checked in aggregator. Also, I never got emails about page counts all sites, but have been checking things in HDFS. Will look into this more in Monday. Thanks Christian!
> On Apr 17, 2015, at 18:47, Christian Aistleitner <[email protected]> > wrote: > > Hi Analytics dev team, > > just a heads up that it's a week that the pagecounts-all-sites (and > pagecounts-raw) did not have the 20150409-160000 file generated [1]. > > To ease data quality assurances and avoid faulty aggregates, the > pageview aggregator scripts that do the aggregation for dashiki's > “Reader / Daily Pageviews” block for a week on missing data (unless > they are being told that for a given day, missing data is ok). > > For the above hourly pagecounts-all-sites file, this week of blocking > has now passed without action. > > Hence, the aggregator scripts will start aggregating again (to some > degree), but the undeclared hole for the 2015-04-09 in the data will > naturally start to bubble up. > > If that hour's file cannot get generated, adding this date to the > BAD_DATES.csv of the aggregator data repository, will unblock the > aggregator cron job and make weekly, monthly, aggregates consider > 2015-04-09 as day without data. > > If that hour's file gets generated, be aware that aggregator per > default only automatically backfills for a week. So from today on, you > need to explicitly run the script to backfill for 2015-04-09. > > Have fun, > Christian > > > P.S.: Since I guess the question of monitoring will arise ... the > missing pagecounts file has alerted people at least twice by email. > The subsequent aggregator blocking has been logged. > But you can add yourself in the MAILTO of the aggregator cron at > modules/statistics/manifests/aggregator.pp > in puppet, if you want an additional notification for that. > > [1] > http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-04/ > http://dumps.wikimedia.org/other/pagecounts-raw/2015/2015-04/ > > -- > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- > Companies' registry: 360296y in Linz > Christian Aistleitner > Kefermarkterstrasze 6a/3 Email: [email protected] > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 > Fax: +43 7946 / 20 5 81 > Homepage: http://quelltextlich.at/ > --------------------------------------------------------------- > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
