Ah, cool. Thanks a lot for pointing this out, Francisco! It's great that the automated views are separated out now.
Thanks! Bob On Thu, May 14, 2020 at 7:19 AM Francisco Dans <[email protected]> wrote: > Robert: the pageview tool now also shows automated views, so you can check > that it is indeed traffic detected as unreported bots: > > > https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=automated&redirects=0&range=latest-90&pages=Main_Page > > On Thu, May 14, 2020 at 7:14 AM Robert West <[email protected]> wrote: > >> Ah, nice! >> >> I noticed that en:Main_Page traffic dropped by 40% as early as April 30, >> 5 days before Nuria's message. >> >> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-90&pages=Main_Page >> >> Just double-checking whether the drop is caused by the change in logging. >> >> Thanks! >> Bob >> >> On Mon, May 4, 2020 at 11:10 PM Nuria Ruiz <[email protected]> wrote: >> >>> Hello: >>> >>> We have added the 'automated' maker to Wikimedia's pageview data. Up to >>> now pageview agents were classified as 'spider' (self reported bots like >>> 'google bot' or 'bing bot') and 'user'. >>> >>> We have known for a while that some requests classified as 'user' were, >>> in fact, coming from automated agents not disclosed as such. This was a >>> well known fact for our community as for a couple years now they have been >>> applying filtering rules for any "Top X" list compiled [1]. We have >>> incorporated some of these filters (and others) to our automated traffic >>> detection and, as of this week, traffic that meets the filtering >>> criteria is now automatically excluded from being counted towards "top" >>> lists reported by the pageview API. >>> >>> The effect of removing pageviews marked as 'automated' from the overall >>> user traffic is about a 5.6% reduction of pageviews labeled as "user" [2] >>> in the course of a month. Not all projects are affected equally when it >>> comes to reduction of "user pageviews". The biggest effect is on English >>> Wikipedia (8-10%). However, projects like the Japanese Wikipedia are mildly >>> affected (< 1%). >>> >>> If you are curious as what problems this type of traffic causes in the >>> data, this ticket for Hungarian Wikipedia is a good example of issues >>> inflicted by what we call "bot vandalism/bot spam": >>> https://phabricator.wikimedia.org/T237282 >>> >>> Given the delicate nature of this data we have worked for many months >>> now on vetting the algorithms we are using. We will appreciate reports via >>> phab ticket for any issues you might find. >>> >>> Thanks, >>> >>> Nuria >>> >>> [1] >>> https://en.wikipedia.org/wiki/Wikipedia:2018_Top_50_Report#Exclusions >>> [2] >>> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection#Global_Impact_-_All_wikimedia_projects >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > -- > *Francisco Dans (él, he, 彼)* > Software Engineer, Analytics Team > stats.wikimedia.org > Wikimedia Foundation > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
