On Thu, Jan 28, 2016 at 11:15 AM, Marcel Ruiz Forns <[email protected]> wrote: > Hi analytics list, > > In the past months the WikimediaBot convention has been mentioned in a > couple threads, but we (Analytics team) never finished establishing and > advertising it. In this email we explain what the convention is today and > what purpose it serves. And also ask for feedback to be sure we can continue > with the next steps. > > What is the WikimediaBot convention? > It is a way of better identifying Wikimedia traffic originated by bots. > Today we know that a significant share of Wikimedia traffic comes from bots. > We can recognize a part of that traffic with regular expressions[1], but we > can not recognize all of it, because some bots do not identify themselves as > such. If we could identify a greater part of the bot traffic, we could also > better isolate the human traffic and permit more accurate analyses. > > Who should follow the convention? > Computer programs that access Wikimedia sites or the Wikimedia API for > reading purposes* in a periodic, scheduled or automatically triggered way. > > Who should NOT follow the convention? > Computer programs that follow the on-site ad-hoc commands of a human, like > browsers. And well known spiders that are otherwise recognizable by their > well known user-agent strings. > > How to follow the convention? > The client's user-agent string should contain the word "WikimediaBot". The > word can be anywhere within the user-agent string and is case-sensitive.
This is useless unless someone is going to start blocking bots that dont follow it. There is an existing policy, which is not being followed / enforced. https://meta.wikimedia.org/wiki/User-Agent_policy It is also extremely annoying that clients (e.g. Pywikibot) now needs to add a Wikimedia specific tag to their user-agent. A user-agent should be client specific, not server specific. Why not just "Bot", or "MediaWikiBot" which at least encompasses all sites that the client can communicate with. -- John Vandenberg _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
