...and no dupes in the refined pagecounts_all_sites table, which would explain the delta. Aha :D. Let's see what we see!
On 23 February 2015 at 10:31, Christian Aistleitner <[email protected]> wrote: > Hi Andrew, > > On Mon, Feb 23, 2015 at 09:35:48AM -0500, Andrew Otto wrote: >> https://gerrit.wikimedia.org/r/#/c/177522/ >> <https://gerrit.wikimedia.org/r/#/c/177522/> >> >> Seeing as this was merged on Jan 26, it is possible that it was not >> deployed when on Jan 27 when Oliver is noticing duplicates. > > That should not be the case. > > Back when you decided that deduplication should happen during refining > from wmf_raw.webrequest to wmf.webrequest, and the above change got > implemented, all of 2015 got deduped and backfilled on wmf.webrequest. > > So all of 2015 in wmf.webrequest is deduped (with the known > limitations). > > Have fun, > Christian > > > > P.S.: And all the wmf.webrequest based jobs from > > > https://commons.wikimedia.org/w/index.php?title=File:Refinery-oozie-overview.png&oldid=149982730 > > that exist for 2015 got re-run on this deduped data too. > > So no dupes for the corresponding legacy tsvs, pagecounts-all-sites, ... > > > > -- > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- > Companies' registry: 360296y in Linz > Christian Aistleitner > Kefermarkterstrasze 6a/3 Email: [email protected] > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 > Fax: +43 7946 / 20 5 81 > Homepage: http://quelltextlich.at/ > --------------------------------------------------------------- > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
