>It will take time for frameworks to implement an amended User-Agent policy.
>For example, pywikipedia (pywikibot compat) is not actively
>maintained.
That doesn't imply we shouldn't have a  policy that anyone can refer to,
these bots will not follow it until they get some maintainers.

>There was a task filled against Analytics for this, but Dan Andreescu
>removed Analytics (https://phabricator.wikimedia.org/T99373#1859170).

Sorry that the tagging is confusing. I think Analytics tag was removed
cause this is a request for data and our team doesn't do data retrieval. We
normally tag with "analytics" phabricator items that have actionables for
our team.
I am cc-ing Bryan who has already done some analysis on bots requests to
the API and can probably provide some data.




On Mon, Feb 1, 2016 at 6:39 AM, John Mark Vandenberg <[email protected]>
wrote:

> Hi Marcel,
>
> It will take time for frameworks to implement an amended User-Agent policy.
> For example, pywikipedia (pywikibot compat) is not actively
> maintained.  We dont know how much traffic is generated by compat.
> There was a task filled against Analytics for this, but Dan Andreescu
> removed Analytics (https://phabricator.wikimedia.org/T99373#1859170).
>
> There are a lot of clients that need to be upgraded or be
> decommissioned for this 'add bot' strategy to be effective in the near
> future.  see https://www.mediawiki.org/wiki/API:Client_code
>
> The all important missing step is
>
> 3. Create a plan to block clients that dont implement the (amended)
> User-Agent policy.
>
> Without that plan, successfully implemented, you will not get quality
> data (i.e. using 'Netscape' in the U-A to guess 'human' would perform
> better).
>
> On Tue, Feb 2, 2016 at 1:24 AM, Marcel Ruiz Forns <[email protected]>
> wrote:
> > So, trying to join everyone's points of view, what about?
> >
> > Using the existing https://meta.wikimedia.org/wiki/User-Agent_policy and
> > modify it to encourage adding the word "bot" (case-insensitive) to the
> > User-Agent string, so that it can be easily used to identify bots in the
> > anlytics cluster (no regexps). And link that page from whatever other
> pages
> > we think necessary.
> >
> > Do some advertising and outreach and get some bot maintainers and maybe
> some
> > frameworks to implement the User-Agent policy. This would make the
> existing
> > policy less useless.
> >
> > Thanks all for the feedback!
> >
> > On Mon, Feb 1, 2016 at 3:16 PM, Marcel Ruiz Forns <[email protected]>
> > wrote:
> >>>
> >>> Clearly Wikipedia et al. uses bot to refer to automated software that
> >>> edits the site but it seems like you are using the term bot to refer
> to all
> >>> automated software and it might be good to clarify.
> >>
> >>
> >> OK, in the documentation we can make that clear. And looking into that,
> >> I've seen that some bots, in the process of doing their "editing" work
> can
> >> also generate pageviews. So we should also include them as potential
> source
> >> of pageview traffic. Maybe we can reuse the existing User-Agent policy.
> >>
> >>
> >>> This makes a lot of sense. If I build a bot that crawls wikipedia and
> >>> facebook public pages it really doesn't make sense that my bot has a
> >>> "wikimediaBot" user agent, just the word "Bot"  should probably be
> enough.
> >>
> >>
> >> Totally agree.
> >>
> >>
> >>> I guess a bigger question is why try to differentiate between "spiders"
> >>> and "bots" at all?
> >>
> >>
> >> I don't think we need to differentiate between "spiders" and "bots". The
> >> most important question we want to respond is: how much of the traffic
> we
> >> consider "human" today is actually "bot". So, +1 "bot"
> (case-insensitive).
> >>
> >>
> >> On Fri, Jan 29, 2016 at 9:16 PM, John Mark Vandenberg <[email protected]
> >
> >> wrote:
> >>>
> >>> On 28 Jan 2016 11:28 pm, "Marcel Ruiz Forns" <[email protected]>
> >>> wrote:
> >>> >>
> >>> >> Why not just "Bot", or "MediaWikiBot" which at least encompasses all
> >>> >> sites that the client
> >>> >> can communicate with.
> >>> >
> >>> > I personally agree with you, "MediaWikiBot" seems to have better
> >>> > semantics.
> >>>
> >>> For clients accessing the MediaWiki api, it is redundant.
> >>> All it does is identify bots that comply with this edict from
> analytics.
> >>>
> >>> --
> >>> John Vandenberg
> >>>
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> [email protected]
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>
> >>
> >>
> >> --
> >> Marcel Ruiz Forns
> >> Analytics Developer
> >> Wikimedia Foundation
> >
> >
> >
> >
> > --
> > Marcel Ruiz Forns
> > Analytics Developer
> > Wikimedia Foundation
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> John Vandenberg
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to