>In the past, the Analytics team also considered enforcing the convention by blocking those bots that don't follow it. And that is still an option to consider. I would like to point out that I think this is probably the prerogative of api's team rather than analytics.
>Another option to this thread would be: cancelling the convention and continue working on regexps I think regardless of our convention we will always be doing regex detection of self-identified bots. Maybe I am missing some nuance here? On Mon, Feb 1, 2016 at 10:42 AM, Nuria Ruiz <[email protected]> wrote: > >It will take time for frameworks to implement an amended User-Agent > policy. > >For example, pywikipedia (pywikibot compat) is not actively > >maintained. > That doesn't imply we shouldn't have a policy that anyone can refer to, > these bots will not follow it until they get some maintainers. > > >There was a task filled against Analytics for this, but Dan Andreescu > >removed Analytics (https://phabricator.wikimedia.org/T99373#1859170). > > Sorry that the tagging is confusing. I think Analytics tag was removed > cause this is a request for data and our team doesn't do data retrieval. We > normally tag with "analytics" phabricator items that have actionables for > our team. > I am cc-ing Bryan who has already done some analysis on bots requests to > the API and can probably provide some data. > > > > > On Mon, Feb 1, 2016 at 6:39 AM, John Mark Vandenberg <[email protected]> > wrote: > >> Hi Marcel, >> >> It will take time for frameworks to implement an amended User-Agent >> policy. >> For example, pywikipedia (pywikibot compat) is not actively >> maintained. We dont know how much traffic is generated by compat. >> There was a task filled against Analytics for this, but Dan Andreescu >> removed Analytics (https://phabricator.wikimedia.org/T99373#1859170). >> >> There are a lot of clients that need to be upgraded or be >> decommissioned for this 'add bot' strategy to be effective in the near >> future. see https://www.mediawiki.org/wiki/API:Client_code >> >> The all important missing step is >> >> 3. Create a plan to block clients that dont implement the (amended) >> User-Agent policy. >> >> Without that plan, successfully implemented, you will not get quality >> data (i.e. using 'Netscape' in the U-A to guess 'human' would perform >> better). >> >> On Tue, Feb 2, 2016 at 1:24 AM, Marcel Ruiz Forns <[email protected]> >> wrote: >> > So, trying to join everyone's points of view, what about? >> > >> > Using the existing https://meta.wikimedia.org/wiki/User-Agent_policy >> and >> > modify it to encourage adding the word "bot" (case-insensitive) to the >> > User-Agent string, so that it can be easily used to identify bots in the >> > anlytics cluster (no regexps). And link that page from whatever other >> pages >> > we think necessary. >> > >> > Do some advertising and outreach and get some bot maintainers and maybe >> some >> > frameworks to implement the User-Agent policy. This would make the >> existing >> > policy less useless. >> > >> > Thanks all for the feedback! >> > >> > On Mon, Feb 1, 2016 at 3:16 PM, Marcel Ruiz Forns <[email protected] >> > >> > wrote: >> >>> >> >>> Clearly Wikipedia et al. uses bot to refer to automated software that >> >>> edits the site but it seems like you are using the term bot to refer >> to all >> >>> automated software and it might be good to clarify. >> >> >> >> >> >> OK, in the documentation we can make that clear. And looking into that, >> >> I've seen that some bots, in the process of doing their "editing" work >> can >> >> also generate pageviews. So we should also include them as potential >> source >> >> of pageview traffic. Maybe we can reuse the existing User-Agent policy. >> >> >> >> >> >>> This makes a lot of sense. If I build a bot that crawls wikipedia and >> >>> facebook public pages it really doesn't make sense that my bot has a >> >>> "wikimediaBot" user agent, just the word "Bot" should probably be >> enough. >> >> >> >> >> >> Totally agree. >> >> >> >> >> >>> I guess a bigger question is why try to differentiate between >> "spiders" >> >>> and "bots" at all? >> >> >> >> >> >> I don't think we need to differentiate between "spiders" and "bots". >> The >> >> most important question we want to respond is: how much of the traffic >> we >> >> consider "human" today is actually "bot". So, +1 "bot" >> (case-insensitive). >> >> >> >> >> >> On Fri, Jan 29, 2016 at 9:16 PM, John Mark Vandenberg < >> [email protected]> >> >> wrote: >> >>> >> >>> On 28 Jan 2016 11:28 pm, "Marcel Ruiz Forns" <[email protected]> >> >>> wrote: >> >>> >> >> >>> >> Why not just "Bot", or "MediaWikiBot" which at least encompasses >> all >> >>> >> sites that the client >> >>> >> can communicate with. >> >>> > >> >>> > I personally agree with you, "MediaWikiBot" seems to have better >> >>> > semantics. >> >>> >> >>> For clients accessing the MediaWiki api, it is redundant. >> >>> All it does is identify bots that comply with this edict from >> analytics. >> >>> >> >>> -- >> >>> John Vandenberg >> >>> >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >> >> >> >> >> >> >> -- >> >> Marcel Ruiz Forns >> >> Analytics Developer >> >> Wikimedia Foundation >> > >> > >> > >> > >> > -- >> > Marcel Ruiz Forns >> > Analytics Developer >> > Wikimedia Foundation >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> >> >> -- >> John Vandenberg >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
