Hi Ahmed and Neil, Super interesting project you have Ahmed :) Thanks Neil for the very precise you had to Ahmed's question !
Some comments about number disparity below: > >> https://quarry.wmflabs.org/query/25783 > > >> >> and I see that Quarry reports 168668 while the REST API reports 169754 >> edits for the same period (less than 1% error). > > Those two metrics (quarry and API) refer to the exact same datatet: revisions from any user type on any page type for 2018-02-28 day, on enwiki. > The first thing to consider is that when a Wikipedia page is deleted, all > the corresponding rows from the revision table are moved to a separate archive > table <https://www.mediawiki.org/wiki/Manual:Archive_table> (probably for > reasons that made much more sense years ago). However, in the Data Lake and > therefore the REST API, there's no such separation. > > This query is one way to get a combined count: > https://quarry.wmflabs.org/query/25794 > > However, combining the two tables yields 171 346 edits, which makes the > Data Lake count about 1% *lower *than the application database count. > When computing revisions with deleted ones on the datalake, we end up with the same exact number found by the Quaryy query: 171346 Now about the difference between Quarry and API on revisions without deletes, it is mostly due to recently deleted data (there still are 126 revisions difference that I don't understand https://quarry.wmflabs.org/query/25796) . Cheers ! Joseph
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
