>Agreed, on both points, but there is a big difference between
>"logically we can reason this is the case" and "we have proven that
>this is the case, and it impacts different groups in these
>proportions, and etc etc etc".

I see. Very true.

> which means even things we all know to be true (like the mobile point)
need validation.
While I do not see anything wrong with documenting and quantifying it, it
is worth to have in mind that mobile is a different case. The sharing of
IPs across many users is common due to mobile protocols use of  NAT-ing:
https://en.wikipedia.org/wiki/Network_address_translation

Take a look at:
http://stackoverflow.com/questions/10946624/finding-ip-address-for-iphone

On Tue, Jan 5, 2016 at 10:31 AM, Oliver Keyes <[email protected]> wrote:

> On 5 January 2016 at 13:01, Nuria Ruiz <[email protected]> wrote:
> >>So, the goal is to have a UUID _distinct_ from IP and user agent (that
> is,
> >> the IP and UA are not related to the UUID that's generated) so
> >>that that UUID can be used as a baseline for accuracy purposes.
> >
> > I understand. But let me re-explain: my point was mentioning that
> regarding
> > #2 (decay) we already know that the IP + UA combo in many instances
> decays
> > real slowly, so the long tail is very significant and that we really do
> not
> > need a token to prove this fact.
> >
> > I just wanted to mention research that has already been done so you have
> it
> > also as a reference and we do not duplicate work.
> >
> >
> >>so much as "does a user_agent/ip hash make a good UUID, generally".
> > Depends on what "generally" means, in mobile the answer is most
> definitely
> > no. Again, you do not need a token to prove this fact, as mobile
> providers
> > use sometimes a short IP range for tens of thousands of customers.
> >
> >
> >
>
> Agreed, on both points, but there is a big difference between
> "logically we can reason this is the case" and "we have proven that
> this is the case, and it impacts different groups in these
> proportions, and etc etc etc". The goal is not just to provide a
> reference point for internal use but also to write it up for
> publication so it can be used more generally, which means even things
> we all know to be true (like the mobile point) need validation.
>
> >
> > On Sun, Jan 3, 2016 at 9:48 AM, Oliver Keyes <[email protected]>
> wrote:
> >>
> >> Hey Nuria,
> >>
> >> So, the goal is to have a UUID _distinct_ from IP and user agent (that
> >> is, the IP and UA are not related to the UUID that's generated) so
> >> that that UUID can be used as a baseline for accuracy purposes. Think
> >> the UUID in the ModuleStorage test datasets from wayback. So it's not
> >> "can any individual user be de-aggregated" so much as "does a
> >> user_agent/ip hash make a good UUID, generally". If I'm understanding
> >> that page correctly, it's more aimed at the former problem.
> >>
> >> On 3 January 2016 at 11:29, Nuria <[email protected]> wrote:
> >> > Oliver,
> >> >
> >> > You might want to check our documentation in wikitech regarding
> identity
> >> > reconstruction. I think it covers your point #1.
> >> >
> >> >
> >> >
> >> >
> https://wikitech.wikimedia.org/wiki/Analytics/Data/Preventing_identity_reconstruction
> >> >
> >> > Nuria
> >> >
> >> >
> >> >
> >> > On Jan 2, 2016, at 10:00 AM, Oliver Keyes <[email protected]>
> wrote:
> >> >
> >> > Hey y'all
> >> >
> >> > I'm working on a piece of research (largely recreational) on the old
> >> > problem of fingerprinting users with minimal information - namely the
> >> > combination of a user agent and an IP address. Basically I'm looking
> >> > to put together a piece of work showing:
> >> >
> >> > 1. How sub-standard it is;
> >> > 2. How fast it decays;
> >> > 3. How the sub-standardness varies by (platform|location)
> >> >
> >> > This would be pretty doable with internal data; basically I'd need a
> >> > schema with IP, user agent and a per-user UUID that's got a decent
> >> > (>=24 hours) expiry time. My question: does anyone know of a table
> >> > with recent data that meets these requirements? And, if not, anyone
> >> > with EventLogging experience interested in working on the problem with
> >> > me?
> >> >
> >> > --
> >> > Oliver Keyes
> >> > Count Logula
> >> > Wikimedia Foundation
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > [email protected]
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > [email protected]
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Count Logula
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to