Hi Ahmed and Neil,
Super interesting project you have Ahmed :)
Thanks Neil for the very precise you had to Ahmed's question !

Some comments about number disparity below:

>
>> https://quarry.wmflabs.org/query/25783
>
>
>>
>> and I see that Quarry reports 168668 while the REST API reports 169754
>> edits for the same period (less than 1% error).
>
>
Those two metrics (quarry and API) refer to the exact same datatet:
revisions from any user type on any page type for 2018-02-28 day, on enwiki.


> The first thing to consider is that when a Wikipedia page is deleted, all
> the corresponding rows from the revision table are moved to a separate archive
> table <https://www.mediawiki.org/wiki/Manual:Archive_table> (probably for
> reasons that made much more sense years ago). However, in the Data Lake and
> therefore the REST API, there's no such separation.
>
> This query is one way to get a combined count:
> https://quarry.wmflabs.org/query/25794
>

> However, combining the two tables yields 171 346 edits, which makes the
> Data Lake count about 1% *lower *than the application database count.
>

When computing revisions with deleted ones on the datalake, we end up with
the same exact number found by the Quaryy query: 171346

Now about the difference between Quarry and API on revisions without
deletes, it is mostly due to recently deleted data  (there still are 126
revisions difference that I don't understand
https://quarry.wmflabs.org/query/25796) .
Cheers !
Joseph
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to