Ah, cool. Thanks a lot for pointing this out, Francisco!
It's great that the automated views are separated out now.

Thanks!
Bob

On Thu, May 14, 2020 at 7:19 AM Francisco Dans <[email protected]> wrote:

> Robert: the pageview tool now also shows automated views, so you can check
> that it is indeed traffic detected as unreported bots:
>
>
> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=automated&redirects=0&range=latest-90&pages=Main_Page
>
> On Thu, May 14, 2020 at 7:14 AM Robert West <[email protected]> wrote:
>
>> Ah, nice!
>>
>> I noticed that en:Main_Page traffic dropped by 40% as early as April 30,
>> 5 days before Nuria's message.
>>
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-90&pages=Main_Page
>>
>> Just double-checking whether the drop is caused by the change in logging.
>>
>> Thanks!
>> Bob
>>
>> On Mon, May 4, 2020 at 11:10 PM Nuria Ruiz <[email protected]> wrote:
>>
>>> Hello:
>>>
>>> We have added the 'automated' maker to Wikimedia's pageview data. Up to
>>> now pageview agents were classified as 'spider' (self reported bots like
>>> 'google bot' or 'bing bot') and 'user'.
>>>
>>> We have known for a while that some requests classified as 'user' were,
>>> in fact, coming from automated agents not disclosed as such. This was a
>>> well known fact for our community as for a couple years now they have been
>>> applying filtering rules for any "Top X" list compiled [1]. We have
>>> incorporated some of these filters (and others) to our automated traffic
>>> detection and, as of this week, traffic that meets the filtering
>>> criteria is now automatically excluded from being counted towards "top"
>>> lists reported by the pageview API.
>>>
>>> The effect of removing pageviews marked as 'automated' from the overall
>>> user traffic is about a 5.6% reduction of pageviews labeled as "user" [2]
>>> in the course of  a month. Not all projects are affected equally when it
>>> comes to reduction of "user pageviews". The biggest effect is on English
>>> Wikipedia (8-10%). However, projects like the Japanese Wikipedia are mildly
>>> affected (< 1%).
>>>
>>> If you are curious as what problems this type of traffic causes in the
>>> data, this ticket for Hungarian Wikipedia is a good example of issues
>>> inflicted by what we call "bot vandalism/bot spam":
>>> https://phabricator.wikimedia.org/T237282
>>>
>>> Given the delicate nature of this data we have worked for many months
>>> now on vetting the algorithms we are using. We will appreciate reports via
>>> phab ticket for any issues you might find.
>>>
>>> Thanks,
>>>
>>> Nuria
>>>
>>> [1]
>>> https://en.wikipedia.org/wiki/Wikipedia:2018_Top_50_Report#Exclusions
>>> [2]
>>> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection#Global_Impact_-_All_wikimedia_projects
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> --
> *Francisco Dans (él, he, 彼)*
> Software Engineer, Analytics Team
> stats.wikimedia.org
> Wikimedia Foundation
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to