Hi Andrew,

On Fri, Apr 17, 2015 at 07:06:58PM -0400, Andrew Otto wrote:
> I've been trying to fix this data all week!

I am with you.
Having had to do it a few times in the past, I definitely know the
pain you're going through :-/

> Also, I never got emails about page counts all sites, [...]

Since the issue was earlier in the pipeline, the expected emails would
not be about pagecounts-all-sites, but about a failed refining step
(which blocks all downstream consumers of that partition [1]). The
corresponding Oozie ID for the failed refining job is:

  0058532-150220163729023-oozie-oozi-C@238

If you want specific alerts about pagecounts-all-sites,

  https://gerrit.wikimedia.org/r/#/c/205067/

would be a simple way to achieve that.

> [...] but have been
> checking things in HDFS.

If jobs really failed or hung (as it seems it was the case here), I
typically just abused the status script and grepped for a status X
... like

  dump() { 
/srv/deployment/analytics/refinery/bin/refinery-dump-status-webrequest-partitions
 --datasets 
legacy_tsvs,mediacounts,pagecounts_all_sites,pagecounts_raw,webrequest 
$((15*24)) ; } ; dump | head -n 4 ; dump | grep X

That always gave me a nice list of where re-runs are still necessary.

(Of course, if jobs did not fail/hang but ran too early due to an
overloaded cluster, the above command would not expose races like the
one for 2015-04-15T15 on text)

> Will look into this more in Monday.

You rock!

Have fun,
Christian

[1] https://commons.wikimedia.org/wiki/File:Refinery-oozie-overview.png



-- 
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  [email protected]
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to