Hi Yuvi,

Maybe you draw some inspiration for meta data from 
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerCountryOverview2014Q2.htm

Cheers,
Erik

> -----Original Message-----
> From: [email protected] [mailto:analytics-
> [email protected]] On Behalf Of Yuvi Panda
> Sent: Monday, August 25, 2014 2:22
> To: A mailing list for the Analytics Team at WMF and everybody who has an
> interest in Wikipedia and analytics.
> Subject: [Analytics] Anonymizing and releasing 'edits per country' data for
> Wiki Projects
> 
> Hello!
> 
> I've been working for the last few days on
> https://github.com/Ironholds/WPDMZ, which currently generates raw data
> on 'number of non-bot edits per country', and I'd like to run some stats /
> make some graphs based on it. Since I'd like al l my 'research' to be
> completely repeatable, I'd love it if we can make the 'raw data' (edits per
> country) publicly available on labsdb. I have most of the code written for it,
> *but* it needs anonymization.
> 
> The biggest de-anonymization threats involve identifying which editors come
> from which countries, and can be executed in the following case:
> 
> An editor is the only person editing from a country in a project where the
> country has low edit volume, and by a process of elimination / counting edits
> from a public source (like recentchanges), the individual editor can be
> connected to a particular country
> 
> I propose the following Anonymization scheme:
> 
> 1. No data for projects with less than a threshold of total *individual 
> editors*
> in the time period for which the data is released.
> 2. For countries that have less than a threshold % of 'individual editors' in 
> the
> time period, we just simply lump them in as 'other'.
> 
> This removes most anonymization attacks I can think of. Thoughts? I can
> easily write up the code to generate these on a monthly basis and puppetize
> those to make the data publicly available. I think not just me, but lots of
> external researchers would benefit from such data.
> 
> Thanks!
> 
> --
> Yuvi Panda T
> http://yuvi.in/blog
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to