I've been trying to fix this data all week!  Thought I had,  but I hadn't 
checked in aggregator. Also,  I never got emails about page counts all sites, 
but have been checking things in HDFS. Will look into this more in Monday. 
Thanks Christian!


> On Apr 17, 2015, at 18:47, Christian Aistleitner <[email protected]> 
> wrote:
> 
> Hi Analytics dev team,
> 
> just a heads up that it's a week that the pagecounts-all-sites (and
> pagecounts-raw) did not have the 20150409-160000 file generated [1].
> 
> To ease data quality assurances and avoid faulty aggregates, the
> pageview aggregator scripts that do the aggregation for dashiki's
> “Reader / Daily Pageviews” block for a week on missing data (unless
> they are being told that for a given day, missing data is ok).
> 
> For the above hourly pagecounts-all-sites file, this week of blocking
> has now passed without action.
> 
> Hence, the aggregator scripts will start aggregating again (to some
> degree), but the undeclared hole for the 2015-04-09 in the data will
> naturally start to bubble up.
> 
> If that hour's file cannot get generated, adding this date to the
> BAD_DATES.csv of the aggregator data repository, will unblock the
> aggregator cron job and make weekly, monthly, aggregates consider
> 2015-04-09 as day without data.
> 
> If that hour's file gets generated, be aware that aggregator per
> default only automatically backfills for a week. So from today on, you
> need to explicitly run the script to backfill for 2015-04-09.
> 
> Have fun,
> Christian
> 
> 
> P.S.: Since I guess the question of monitoring will arise ... the
> missing pagecounts file has alerted people at least twice by email.
> The subsequent aggregator blocking has been logged.
> But you can add yourself in the MAILTO of the aggregator cron at
>  modules/statistics/manifests/aggregator.pp
> in puppet, if you want an additional notification for that.
> 
> [1]
>  http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-04/
>  http://dumps.wikimedia.org/other/pagecounts-raw/2015/2015-04/
> 
> -- 
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>                           Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3     Email:  [email protected]
> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>                             Fax:            +43 7946 / 20 5 81
>                             Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to