I reran the offending jobs today. This data should now be available.
> On Apr 18, 2015, at 19:07, Christian Aistleitner <[email protected]> > wrote: > > Hi Andrew, > > On Fri, Apr 17, 2015 at 07:06:58PM -0400, Andrew Otto wrote: >> I've been trying to fix this data all week! > > I am with you. > Having had to do it a few times in the past, I definitely know the > pain you're going through :-/ > >> Also, I never got emails about page counts all sites, [...] > > Since the issue was earlier in the pipeline, the expected emails would > not be about pagecounts-all-sites, but about a failed refining step > (which blocks all downstream consumers of that partition [1]). The > corresponding Oozie ID for the failed refining job is: > > 0058532-150220163729023-oozie-oozi-C@238 > > If you want specific alerts about pagecounts-all-sites, > > https://gerrit.wikimedia.org/r/#/c/205067/ > > would be a simple way to achieve that. > >> [...] but have been >> checking things in HDFS. > > If jobs really failed or hung (as it seems it was the case here), I > typically just abused the status script and grepped for a status X > ... like > > dump() { > /srv/deployment/analytics/refinery/bin/refinery-dump-status-webrequest-partitions > --datasets > legacy_tsvs,mediacounts,pagecounts_all_sites,pagecounts_raw,webrequest > $((15*24)) ; } ; dump | head -n 4 ; dump | grep X > > That always gave me a nice list of where re-runs are still necessary. > > (Of course, if jobs did not fail/hang but ran too early due to an > overloaded cluster, the above command would not expose races like the > one for 2015-04-15T15 on text) > >> Will look into this more in Monday. > > You rock! > > Have fun, > Christian > > [1] https://commons.wikimedia.org/wiki/File:Refinery-oozie-overview.png > > > > -- > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- > Companies' registry: 360296y in Linz > Christian Aistleitner > Kefermarkterstrasze 6a/3 Email: [email protected] > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 > Fax: +43 7946 / 20 5 81 > Homepage: http://quelltextlich.at/ > --------------------------------------------------------------- > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
