On 21 December 2015 at 21:00, John Mark Vandenberg <[email protected]> wrote:
> On Tue, Dec 22, 2015 at 12:23 PM, Madhumitha Viswanathan
> <[email protected]> wrote:
>>
>>
>> On Mon, Dec 21, 2015 at 5:15 PM, John Mark Vandenberg <[email protected]>
>> wrote:
>>>
>>> On Tue, Dec 15, 2015 at 10:51 AM, Madhumitha Viswanathan
>>> <[email protected]> wrote:
>>> > +1 Oliver - User agents tagged with WikimediaBot are tagged as bot - I
>>> > do
>>> > agree that our documentation on this can be approved, I'll update the
>>> > Webrequest and Pageview tables docs to reflect this.
>>>
>>> Where was this announced?
>>> I don't believe pywikibot does this, or was notified that it should do
>>> this...?
>>>
>> Apologies, it wasn't. Here is a task for it -
>> https://phabricator.wikimedia.org/T108599, and it's in our pipeline to get
>> done.
>>
>>>
>>> Are accounts with the bot flag also tagged as bot?
>>>
>>
>> I believe bot flags associated with accounts are not part of the webrequest
>> data, so we don't look at it.
>
> There is a bot request parameter associated with many write actions,
> and there is assert=bot available for all API requests since 1.23 (and
> earlier with Extension:AssertEdit)
> See https://www.mediawiki.org/wiki/API:Assert .
>
> Why cant those be used?  They are validated data.
>

Because many "bot" requests go nowhere near the API, because almost no
"pageviews" go near the API, because Assert is designed exclusively
for logged-in API requests, which most API requests are not, because
Assert is designed (primarily) for edits, which no pageviews are.

> user-agent with 'WikimediaBot' is not validated data; anyone can
> change the user-agent and it magically becomes a bot?  That sounds
> like a way to ensure this data is not reliable and a waste of effort
> to build.
>

Anyone can change the user-agent and it magically becomes considered
automated software, yes. This is absolutely no different from the
moment, where anyone can change their user agent to say, the GoogleBot
user agent and also becomes considered automated software. The vast
vast vast majority of actual human users never do this, and those that
do tend not to be interested in distorting our automata statistics but
instead not providing a consistent user agent for privacy purposes, in
which case they use browser extensions to roll between an array of
actual human UAs. There's no real incentive to roll between automata
UAs because some sites restrict the features you can or can't access
(for example: not providing JavaScript) if they think you're a
crawler. As of yesterday, I have been handling the raw webrequest logs
for 2 solid years and in that time the number of obviously-human
"automata" I've seen has been minuscule.

> --
> John Vandenberg
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to