>It will take time for frameworks to implement an amended User-Agent policy. >For example, pywikipedia (pywikibot compat) is not actively >maintained. That doesn't imply we shouldn't have a policy that anyone can refer to, these bots will not follow it until they get some maintainers.
>There was a task filled against Analytics for this, but Dan Andreescu >removed Analytics (https://phabricator.wikimedia.org/T99373#1859170). Sorry that the tagging is confusing. I think Analytics tag was removed cause this is a request for data and our team doesn't do data retrieval. We normally tag with "analytics" phabricator items that have actionables for our team. I am cc-ing Bryan who has already done some analysis on bots requests to the API and can probably provide some data. On Mon, Feb 1, 2016 at 6:39 AM, John Mark Vandenberg <[email protected]> wrote: > Hi Marcel, > > It will take time for frameworks to implement an amended User-Agent policy. > For example, pywikipedia (pywikibot compat) is not actively > maintained. We dont know how much traffic is generated by compat. > There was a task filled against Analytics for this, but Dan Andreescu > removed Analytics (https://phabricator.wikimedia.org/T99373#1859170). > > There are a lot of clients that need to be upgraded or be > decommissioned for this 'add bot' strategy to be effective in the near > future. see https://www.mediawiki.org/wiki/API:Client_code > > The all important missing step is > > 3. Create a plan to block clients that dont implement the (amended) > User-Agent policy. > > Without that plan, successfully implemented, you will not get quality > data (i.e. using 'Netscape' in the U-A to guess 'human' would perform > better). > > On Tue, Feb 2, 2016 at 1:24 AM, Marcel Ruiz Forns <[email protected]> > wrote: > > So, trying to join everyone's points of view, what about? > > > > Using the existing https://meta.wikimedia.org/wiki/User-Agent_policy and > > modify it to encourage adding the word "bot" (case-insensitive) to the > > User-Agent string, so that it can be easily used to identify bots in the > > anlytics cluster (no regexps). And link that page from whatever other > pages > > we think necessary. > > > > Do some advertising and outreach and get some bot maintainers and maybe > some > > frameworks to implement the User-Agent policy. This would make the > existing > > policy less useless. > > > > Thanks all for the feedback! > > > > On Mon, Feb 1, 2016 at 3:16 PM, Marcel Ruiz Forns <[email protected]> > > wrote: > >>> > >>> Clearly Wikipedia et al. uses bot to refer to automated software that > >>> edits the site but it seems like you are using the term bot to refer > to all > >>> automated software and it might be good to clarify. > >> > >> > >> OK, in the documentation we can make that clear. And looking into that, > >> I've seen that some bots, in the process of doing their "editing" work > can > >> also generate pageviews. So we should also include them as potential > source > >> of pageview traffic. Maybe we can reuse the existing User-Agent policy. > >> > >> > >>> This makes a lot of sense. If I build a bot that crawls wikipedia and > >>> facebook public pages it really doesn't make sense that my bot has a > >>> "wikimediaBot" user agent, just the word "Bot" should probably be > enough. > >> > >> > >> Totally agree. > >> > >> > >>> I guess a bigger question is why try to differentiate between "spiders" > >>> and "bots" at all? > >> > >> > >> I don't think we need to differentiate between "spiders" and "bots". The > >> most important question we want to respond is: how much of the traffic > we > >> consider "human" today is actually "bot". So, +1 "bot" > (case-insensitive). > >> > >> > >> On Fri, Jan 29, 2016 at 9:16 PM, John Mark Vandenberg <[email protected] > > > >> wrote: > >>> > >>> On 28 Jan 2016 11:28 pm, "Marcel Ruiz Forns" <[email protected]> > >>> wrote: > >>> >> > >>> >> Why not just "Bot", or "MediaWikiBot" which at least encompasses all > >>> >> sites that the client > >>> >> can communicate with. > >>> > > >>> > I personally agree with you, "MediaWikiBot" seems to have better > >>> > semantics. > >>> > >>> For clients accessing the MediaWiki api, it is redundant. > >>> All it does is identify bots that comply with this edict from > analytics. > >>> > >>> -- > >>> John Vandenberg > >>> > >>> > >>> _______________________________________________ > >>> Analytics mailing list > >>> [email protected] > >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> > >> > >> > >> > >> -- > >> Marcel Ruiz Forns > >> Analytics Developer > >> Wikimedia Foundation > > > > > > > > > > -- > > Marcel Ruiz Forns > > Analytics Developer > > Wikimedia Foundation > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > John Vandenberg > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
