On 21 December 2015 at 21:00, John Mark Vandenberg <[email protected]> wrote: > On Tue, Dec 22, 2015 at 12:23 PM, Madhumitha Viswanathan > <[email protected]> wrote: >> >> >> On Mon, Dec 21, 2015 at 5:15 PM, John Mark Vandenberg <[email protected]> >> wrote: >>> >>> On Tue, Dec 15, 2015 at 10:51 AM, Madhumitha Viswanathan >>> <[email protected]> wrote: >>> > +1 Oliver - User agents tagged with WikimediaBot are tagged as bot - I >>> > do >>> > agree that our documentation on this can be approved, I'll update the >>> > Webrequest and Pageview tables docs to reflect this. >>> >>> Where was this announced? >>> I don't believe pywikibot does this, or was notified that it should do >>> this...? >>> >> Apologies, it wasn't. Here is a task for it - >> https://phabricator.wikimedia.org/T108599, and it's in our pipeline to get >> done. >> >>> >>> Are accounts with the bot flag also tagged as bot? >>> >> >> I believe bot flags associated with accounts are not part of the webrequest >> data, so we don't look at it. > > There is a bot request parameter associated with many write actions, > and there is assert=bot available for all API requests since 1.23 (and > earlier with Extension:AssertEdit) > See https://www.mediawiki.org/wiki/API:Assert . > > Why cant those be used? They are validated data. >
Because many "bot" requests go nowhere near the API, because almost no "pageviews" go near the API, because Assert is designed exclusively for logged-in API requests, which most API requests are not, because Assert is designed (primarily) for edits, which no pageviews are. > user-agent with 'WikimediaBot' is not validated data; anyone can > change the user-agent and it magically becomes a bot? That sounds > like a way to ensure this data is not reliable and a waste of effort > to build. > Anyone can change the user-agent and it magically becomes considered automated software, yes. This is absolutely no different from the moment, where anyone can change their user agent to say, the GoogleBot user agent and also becomes considered automated software. The vast vast vast majority of actual human users never do this, and those that do tend not to be interested in distorting our automata statistics but instead not providing a consistent user agent for privacy purposes, in which case they use browser extensions to roll between an array of actual human UAs. There's no real incentive to roll between automata UAs because some sites restrict the features you can or can't access (for example: not providing JavaScript) if they think you're a crawler. As of yesterday, I have been handling the raw webrequest logs for 2 solid years and in that time the number of obviously-human "automata" I've seen has been minuscule. > -- > John Vandenberg > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
