Hi Andrew,

On Mon, Feb 23, 2015 at 09:35:48AM -0500, Andrew Otto wrote:
> https://gerrit.wikimedia.org/r/#/c/177522/ 
> <https://gerrit.wikimedia.org/r/#/c/177522/>
> 
> Seeing as this was merged on Jan 26, it is possible that it was not
> deployed when on Jan 27 when Oliver is noticing duplicates.

That should not be the case.

Back when you decided that deduplication should happen during refining
from wmf_raw.webrequest to wmf.webrequest, and the above change got
implemented, all of 2015 got deduped and backfilled on wmf.webrequest.

So all of 2015 in wmf.webrequest is deduped (with the known
limitations).

Have fun,
Christian



P.S.: And all the wmf.webrequest based jobs from

  
https://commons.wikimedia.org/w/index.php?title=File:Refinery-oozie-overview.png&oldid=149982730

that exist for 2015 got re-run on this deduped data too.

So no dupes for the corresponding legacy tsvs, pagecounts-all-sites, ...



-- 
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  [email protected]
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to