>So, the goal is to have a UUID _distinct_ from IP and user agent (that is,
the IP and UA are not related to the UUID that's generated) so
>that that UUID can be used as a baseline for accuracy purposes.

I understand. But let me re-explain: my point was mentioning that regarding
#2 (decay) we already know that the IP + UA combo in many instances decays
real slowly, so the long tail is very significant and that we really do not
need a token to prove this fact.

I just wanted to mention research that has already been done so you have it
also as a reference and we do not duplicate work.


>so much as "does a user_agent/ip hash make a good UUID, generally".
Depends on what "generally" means, in mobile the answer is most definitely
no. Again, you do not need a token to prove this fact, as mobile providers
use sometimes a short IP range for tens of thousands of customers.




On Sun, Jan 3, 2016 at 9:48 AM, Oliver Keyes <[email protected]> wrote:

> Hey Nuria,
>
> So, the goal is to have a UUID _distinct_ from IP and user agent (that
> is, the IP and UA are not related to the UUID that's generated) so
> that that UUID can be used as a baseline for accuracy purposes. Think
> the UUID in the ModuleStorage test datasets from wayback. So it's not
> "can any individual user be de-aggregated" so much as "does a
> user_agent/ip hash make a good UUID, generally". If I'm understanding
> that page correctly, it's more aimed at the former problem.
>
> On 3 January 2016 at 11:29, Nuria <[email protected]> wrote:
> > Oliver,
> >
> > You might want to check our documentation in wikitech regarding identity
> > reconstruction. I think it covers your point #1.
> >
> >
> >
> https://wikitech.wikimedia.org/wiki/Analytics/Data/Preventing_identity_reconstruction
> >
> > Nuria
> >
> >
> >
> > On Jan 2, 2016, at 10:00 AM, Oliver Keyes <[email protected]> wrote:
> >
> > Hey y'all
> >
> > I'm working on a piece of research (largely recreational) on the old
> > problem of fingerprinting users with minimal information - namely the
> > combination of a user agent and an IP address. Basically I'm looking
> > to put together a piece of work showing:
> >
> > 1. How sub-standard it is;
> > 2. How fast it decays;
> > 3. How the sub-standardness varies by (platform|location)
> >
> > This would be pretty doable with internal data; basically I'd need a
> > schema with IP, user agent and a per-user UUID that's got a decent
> > (>=24 hours) expiry time. My question: does anyone know of a table
> > with recent data that meets these requirements? And, if not, anyone
> > with EventLogging experience interested in working on the problem with
> > me?
> >
> > --
> > Oliver Keyes
> > Count Logula
> > Wikimedia Foundation
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to