Hey Nuria, So, the goal is to have a UUID _distinct_ from IP and user agent (that is, the IP and UA are not related to the UUID that's generated) so that that UUID can be used as a baseline for accuracy purposes. Think the UUID in the ModuleStorage test datasets from wayback. So it's not "can any individual user be de-aggregated" so much as "does a user_agent/ip hash make a good UUID, generally". If I'm understanding that page correctly, it's more aimed at the former problem.
On 3 January 2016 at 11:29, Nuria <[email protected]> wrote: > Oliver, > > You might want to check our documentation in wikitech regarding identity > reconstruction. I think it covers your point #1. > > > https://wikitech.wikimedia.org/wiki/Analytics/Data/Preventing_identity_reconstruction > > Nuria > > > > On Jan 2, 2016, at 10:00 AM, Oliver Keyes <[email protected]> wrote: > > Hey y'all > > I'm working on a piece of research (largely recreational) on the old > problem of fingerprinting users with minimal information - namely the > combination of a user agent and an IP address. Basically I'm looking > to put together a piece of work showing: > > 1. How sub-standard it is; > 2. How fast it decays; > 3. How the sub-standardness varies by (platform|location) > > This would be pretty doable with internal data; basically I'd need a > schema with IP, user agent and a per-user UUID that's got a decent > (>=24 hours) expiry time. My question: does anyone know of a table > with recent data that meets these requirements? And, if not, anyone > with EventLogging experience interested in working on the problem with > me? > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
