Hi Thorsten!

Did you just filter out the editors marked as bots via a userGroup?

We also filter out some editors by username, because some bots are not
marked as such via a userGroup. The regular expression we use is this one
(IIRC):
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/user/UserEventBuilder.scala#L24

Not sure that's the only source of discrepancy, but could be! Please, let
us know.

thanks!


On Fri, Sep 11, 2020 at 4:22 PM Thorsten Ruprechter <[email protected]>
wrote:

> Hello,
>
> I have a question about the "User edits" metric presented on Wikistats,
> and would be very grateful for advice regarding an issue we encountered.
>
> We are currently computing some edit metrics for multiple Wikipedia
> language versions. However, we realized there is some discrepancy between
> our edit count results and the ones reported on Wikistats. It seems that
> total edit counts are higher for our data, while trends for daily edits are
> also different. As an example, the French Wikipedia:
>
> Wikistats:
>
> https://stats.wikimedia.org/#/fr.wikipedia.org/contributing/user-edits/normal|line|2020-01-01~2020-05-16|page_type~content|daily
>
> Our results (see attachment):
>
>
>
> We removed all users marked as bots in the database, and excluded edits to
> talk pages, as it is done with the Wikistats edit count metric. I just now
> found this note [1]: "The original Wikistats did not count edits if the
> page they were made on was deleted. We are doing the same thing in
> Wikistats 2 for now, which means you may see metric totals shifting over
> time (as pages are deleted)."
>
> Could this be what is causing this rift, or are there other processing
> details which we have to consider to reproduce the Wikistats numbers as
> closely as possible? On a separate note - are the daily edit counts for all
> pages (including deleted articles) accessible somewhere?
>
> thanks, thorsten
>
> [1] https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Edits
>
> --
> Thorsten Ruprechter
>
> Institute of Interactive Systems and Data Science (ISDS)
> Graz University of Technology, Austria
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>


-- 
*Marcel Ruiz Forns** (he/him)*
Senior Software Engineer
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to