Ah, nice!

I noticed that en:Main_Page traffic dropped by 40% as early as April 30, 5
days before Nuria's message.
https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-90&pages=Main_Page

Just double-checking whether the drop is caused by the change in logging.

Thanks!
Bob

On Mon, May 4, 2020 at 11:10 PM Nuria Ruiz <[email protected]> wrote:

> Hello:
>
> We have added the 'automated' maker to Wikimedia's pageview data. Up to
> now pageview agents were classified as 'spider' (self reported bots like
> 'google bot' or 'bing bot') and 'user'.
>
> We have known for a while that some requests classified as 'user' were, in
> fact, coming from automated agents not disclosed as such. This was a well
> known fact for our community as for a couple years now they have been
> applying filtering rules for any "Top X" list compiled [1]. We have
> incorporated some of these filters (and others) to our automated traffic
> detection and, as of this week, traffic that meets the filtering
> criteria is now automatically excluded from being counted towards "top"
> lists reported by the pageview API.
>
> The effect of removing pageviews marked as 'automated' from the overall
> user traffic is about a 5.6% reduction of pageviews labeled as "user" [2]
> in the course of  a month. Not all projects are affected equally when it
> comes to reduction of "user pageviews". The biggest effect is on English
> Wikipedia (8-10%). However, projects like the Japanese Wikipedia are mildly
> affected (< 1%).
>
> If you are curious as what problems this type of traffic causes in the
> data, this ticket for Hungarian Wikipedia is a good example of issues
> inflicted by what we call "bot vandalism/bot spam":
> https://phabricator.wikimedia.org/T237282
>
> Given the delicate nature of this data we have worked for many months now
> on vetting the algorithms we are using. We will appreciate reports via phab
> ticket for any issues you might find.
>
> Thanks,
>
> Nuria
>
> [1] https://en.wikipedia.org/wiki/Wikipedia:2018_Top_50_Report#Exclusions
> [2]
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection#Global_Impact_-_All_wikimedia_projects
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to