Hi Yuvi, Maybe you draw some inspiration for meta data from http://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerCountryOverview2014Q2.htm
Cheers, Erik > -----Original Message----- > From: [email protected] [mailto:analytics- > [email protected]] On Behalf Of Yuvi Panda > Sent: Monday, August 25, 2014 2:22 > To: A mailing list for the Analytics Team at WMF and everybody who has an > interest in Wikipedia and analytics. > Subject: [Analytics] Anonymizing and releasing 'edits per country' data for > Wiki Projects > > Hello! > > I've been working for the last few days on > https://github.com/Ironholds/WPDMZ, which currently generates raw data > on 'number of non-bot edits per country', and I'd like to run some stats / > make some graphs based on it. Since I'd like al l my 'research' to be > completely repeatable, I'd love it if we can make the 'raw data' (edits per > country) publicly available on labsdb. I have most of the code written for it, > *but* it needs anonymization. > > The biggest de-anonymization threats involve identifying which editors come > from which countries, and can be executed in the following case: > > An editor is the only person editing from a country in a project where the > country has low edit volume, and by a process of elimination / counting edits > from a public source (like recentchanges), the individual editor can be > connected to a particular country > > I propose the following Anonymization scheme: > > 1. No data for projects with less than a threshold of total *individual > editors* > in the time period for which the data is released. > 2. For countries that have less than a threshold % of 'individual editors' in > the > time period, we just simply lump them in as 'other'. > > This removes most anonymization attacks I can think of. Thoughts? I can > easily write up the code to generate these on a monthly basis and puppetize > those to make the data publicly available. I think not just me, but lots of > external researchers would benefit from such data. > > Thanks! > > -- > Yuvi Panda T > http://yuvi.in/blog > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
