Assuming this was public, I could use this data on seldom edited Wikis to find out which editors likely have old browser/OS versions with vulnerabilities that I could attack[1]. This would be easier and easier the more dimensions you add to the data.
<re-reads> OK. The anonymization strategy for dropping records that represent < 50 distinct editors seems to address this concern. 50 edits is a lot. So this data wouldn't be too terribly useful for under-active wikis. Then again, if you just want to a sense for what the dominant browser/OS pairs are, then they will likely represent > 50 unique editors on most projects. 1. Props to Matt Flaschen and Dan Andreescu for helping me work through the implications of that one. On Tue, Mar 3, 2015 at 9:59 PM, Oliver Keyes <[email protected]> wrote: > Yeah, makes sense. > > On 3 March 2015 at 20:38, Nuria Ruiz <[email protected]> wrote: > >>Agreed. Do we have a way of syncing files to Labs yet? > > No need to sync if file is available in an endpoint like > > htpp://some-data-here > > > > On Tue, Mar 3, 2015 at 4:50 PM, Oliver Keyes <[email protected]> > wrote: > >> > >> On 3 March 2015 at 19:35, Nuria Ruiz <[email protected]> wrote: > >> >>Erik has asked me to write an exploratory app for user-agent data. The > >> >>idea is to enable Product Managers and engineers to easily explore > >> >>what users use so they know what to support. I've thrown up an example > >> >>screenshot at http://ironholds.org/agents_example_screen.png > >> > > >> > I cannot speak as to the interest of community about this data but for > >> > developers and PM we should make sure we have a solid way to update > any > >> > data > >> > we put up. User Agent data is outdated as soon as a new version of > >> > android > >> > or iOs is released, a new popular phone comes along or a new > autoupdate > >> > for > >> > popular browsers. Not only that, if we make changes to, say, redirect > >> > all > >> > iPad users to the desktop site we want to asses effect of those > changes > >> > as > >> > soon as possible. A monthly update will be a must. Also distinguishing > >> > between browser percentages on desktop site versus mobile site versus > >> > apps > >> > is a must for this data to be real useful for PMs and developers > >> > (specially > >> > for bug triage). > >> > > >> > >> Yes! However, I am addressing a specific ad-hoc request. If there is a > >> need for this (I agree there is) I hope Toby and Kevin can eke out the > >> time on the Analytics Engineering schedule to work on it; y'all are a > >> lot better at infrastructure work than me :). > >> > >> > > >> > We have couple backlog items to make monthly reports on this regard. A > >> > UI on > >> > top of them will be superb. > >> > > >> > >> Agreed. Do we have a way of syncing files to Labs yet? That's the > >> biggest blocker. The UI doesn't care what the file contains as long as > >> it's a TSV with a header row - I've deliberately built it so that > >> things like the download links are dynamic and can change. > >> > >> > > >> > > >> > > >> > > >> > On Tue, Mar 3, 2015 at 1:05 PM, Oliver Keyes <[email protected]> > >> > wrote: > >> >> > >> >> Hey all, > >> >> > >> >> (Sending this to the public list because it's more transparent and > I'd > >> >> like people who think this data is useful to be able to shout out) > >> >> > >> >> Erik has asked me to write an exploratory app for user-agent data. > The > >> >> idea is to enable Product Managers and engineers to easily explore > >> >> what users use so they know what to support. I've thrown up an > example > >> >> screenshot at http://ironholds.org/agents_example_screen.png (I'd > >> >> host it on Commons, inb4Dario, but I'm not sure the copyright status > >> >> of the UI) > >> >> > >> >> One side-effect of this is that we end up with files of common user > >> >> agents, split between {readers,editors} and {mobile, desktop}, parsed > >> >> and unparsed. I'd like to release these files. The reuse potential is > >> >> twofold; researchers and engineers can use the parsed files to see > >> >> what browser penetration looks like globally and what browsers should > >> >> be supported at a top-10, and software engineers can use the unparsed > >> >> files to improve detection rates. > >> >> > >> >> The privacy implications /should/ be minimal, because of how this > data > >> >> is gathered. The editor data is gathered from the checkuser table, > >> >> globally, and automatically excludes any user agent used by fewer > than > >> >> 50 distinct usernames. The reader data is gathered from a month of > >> >> 1:1000 sampled log files, and excludes any agent responsible for > fewer > >> >> than 500 pageviews in a 24 hour period (except, sampled. So, > >> >> practically speaking, that's 500,000 pageviews) > >> >> > >> >> What do people think about making this a data release? Would people > >> >> get value from the data, as well as the tool? > >> >> > >> >> -- > >> >> Oliver Keyes > >> >> Research Analyst > >> >> Wikimedia Foundation > >> >> > >> >> _______________________________________________ > >> >> Analytics mailing list > >> >> [email protected] > >> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > >> > > >> > > >> > _______________________________________________ > >> > Analytics mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > >> > >> > >> > >> -- > >> Oliver Keyes > >> Research Analyst > >> Wikimedia Foundation > >> > >> _______________________________________________ > >> Analytics mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
