...and no dupes in the refined pagecounts_all_sites table, which would
explain the delta. Aha :D. Let's see what we see!

On 23 February 2015 at 10:31, Christian Aistleitner
<[email protected]> wrote:
> Hi Andrew,
>
> On Mon, Feb 23, 2015 at 09:35:48AM -0500, Andrew Otto wrote:
>> https://gerrit.wikimedia.org/r/#/c/177522/ 
>> <https://gerrit.wikimedia.org/r/#/c/177522/>
>>
>> Seeing as this was merged on Jan 26, it is possible that it was not
>> deployed when on Jan 27 when Oliver is noticing duplicates.
>
> That should not be the case.
>
> Back when you decided that deduplication should happen during refining
> from wmf_raw.webrequest to wmf.webrequest, and the above change got
> implemented, all of 2015 got deduped and backfilled on wmf.webrequest.
>
> So all of 2015 in wmf.webrequest is deduped (with the known
> limitations).
>
> Have fun,
> Christian
>
>
>
> P.S.: And all the wmf.webrequest based jobs from
>
>   
> https://commons.wikimedia.org/w/index.php?title=File:Refinery-oozie-overview.png&oldid=149982730
>
> that exist for 2015 got re-run on this deduped data too.
>
> So no dupes for the corresponding legacy tsvs, pagecounts-all-sites, ...
>
>
>
> --
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>                            Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3     Email:  [email protected]
> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>                              Fax:            +43 7946 / 20 5 81
>                              Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to