Hi Andrew, On Mon, Feb 23, 2015 at 09:35:48AM -0500, Andrew Otto wrote: > https://gerrit.wikimedia.org/r/#/c/177522/ > <https://gerrit.wikimedia.org/r/#/c/177522/> > > Seeing as this was merged on Jan 26, it is possible that it was not > deployed when on Jan 27 when Oliver is noticing duplicates.
That should not be the case. Back when you decided that deduplication should happen during refining from wmf_raw.webrequest to wmf.webrequest, and the above change got implemented, all of 2015 got deduped and backfilled on wmf.webrequest. So all of 2015 in wmf.webrequest is deduped (with the known limitations). Have fun, Christian P.S.: And all the wmf.webrequest based jobs from https://commons.wikimedia.org/w/index.php?title=File:Refinery-oozie-overview.png&oldid=149982730 that exist for 2015 got re-run on this deduped data too. So no dupes for the corresponding legacy tsvs, pagecounts-all-sites, ... -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: [email protected] 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/ ---------------------------------------------------------------
signature.asc
Description: Digital signature
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
