Thanks for the information Oliver. Hi John -- I just wanted to point out in a friendly way that your original email would have been just as effective if you had omitted the last line about a waste of effort to build. We always like to get feedback and questions from the community but the analytics team works hard to make good decisions and use donor money wisely. I'd love to see more constructive language on these lists.
Warmly, -Toby On Tue, Dec 22, 2015 at 12:30 AM, Oliver Keyes <[email protected]> wrote: > On 21 December 2015 at 21:00, John Mark Vandenberg <[email protected]> > wrote: > > On Tue, Dec 22, 2015 at 12:23 PM, Madhumitha Viswanathan > > <[email protected]> wrote: > >> > >> > >> On Mon, Dec 21, 2015 at 5:15 PM, John Mark Vandenberg <[email protected] > > > >> wrote: > >>> > >>> On Tue, Dec 15, 2015 at 10:51 AM, Madhumitha Viswanathan > >>> <[email protected]> wrote: > >>> > +1 Oliver - User agents tagged with WikimediaBot are tagged as bot - > I > >>> > do > >>> > agree that our documentation on this can be approved, I'll update the > >>> > Webrequest and Pageview tables docs to reflect this. > >>> > >>> Where was this announced? > >>> I don't believe pywikibot does this, or was notified that it should do > >>> this...? > >>> > >> Apologies, it wasn't. Here is a task for it - > >> https://phabricator.wikimedia.org/T108599, and it's in our pipeline to > get > >> done. > >> > >>> > >>> Are accounts with the bot flag also tagged as bot? > >>> > >> > >> I believe bot flags associated with accounts are not part of the > webrequest > >> data, so we don't look at it. > > > > There is a bot request parameter associated with many write actions, > > and there is assert=bot available for all API requests since 1.23 (and > > earlier with Extension:AssertEdit) > > See https://www.mediawiki.org/wiki/API:Assert . > > > > Why cant those be used? They are validated data. > > > > Because many "bot" requests go nowhere near the API, because almost no > "pageviews" go near the API, because Assert is designed exclusively > for logged-in API requests, which most API requests are not, because > Assert is designed (primarily) for edits, which no pageviews are. > > > user-agent with 'WikimediaBot' is not validated data; anyone can > > change the user-agent and it magically becomes a bot? That sounds > > like a way to ensure this data is not reliable and a waste of effort > > to build. > > > > Anyone can change the user-agent and it magically becomes considered > automated software, yes. This is absolutely no different from the > moment, where anyone can change their user agent to say, the GoogleBot > user agent and also becomes considered automated software. The vast > vast vast majority of actual human users never do this, and those that > do tend not to be interested in distorting our automata statistics but > instead not providing a consistent user agent for privacy purposes, in > which case they use browser extensions to roll between an array of > actual human UAs. There's no real incentive to roll between automata > UAs because some sites restrict the features you can or can't access > (for example: not providing JavaScript) if they think you're a > crawler. As of yesterday, I have been handling the raw webrequest logs > for 2 solid years and in that time the number of obviously-human > "automata" I've seen has been minuscule. > > > -- > > John Vandenberg > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
