Thanks for your feedback and info on how sync does some things Ryan.

On 5 May 2015 at 17:33, Ryan Kelly <[email protected]> wrote:
> One of the tricky-but-important questions we need to answer is:
>
> * how many users accessed more than one FxA service this month?

Like *any* two, or a specific two e.g. Sync and Hello? (Or three I suppose.)

> Right, so the simplest thing would be for each service to just emit a
> bunch of JSON log entries like this:
>
>   log.info({
>       service: "hello"
>       uid: "ABCDEF123456"
>       event: "call",
>       timestamp: 1430802399476,
>   })
>
>   log.info({
>       service: "readinglist"
>       uid: "ABCDEF123456"
>       event: "save_item",
>       timestamp: 1430803546071,
>   })

This looks fine to me. It looks about as simple as we can get it -
which is good, 'coz then we can always add to it if necessary rather
than adding extra cruft in now.

> Can we do it in a more privacy-conscious manner?
>
> > Perhaps we could
> > post-process these in the data pipeline into something else, or we can
> > log something locally which we could use to correlate that same user to
> > another service (but not back to the user him/herself). The idea of a
> > Metrics ID has been raised which is a one-way mapping from uid to
> > Metrics ID (am leaving out any implementation details for now).
>
> We have a tiny bit of prior art here, in the monthly-active-users
> counting for sync:
>
>   https://bugzilla.mozilla.org/show_bug.cgi?id=1136014
>
> For this, we wound up emitting metrics events that look like:
>
>   log.info({
>       uid: HMAC_SHA256(<secret key>, <uid>),
>       timestamp: 1430802399476,
>       ...other sync-specific metrics...
>   })
>
> In other words, we use HMAC to derive an opaque "metrics id" from the
> account uid.  This lets us count unique users of the service, but makes
> it harder to correlate the logs with a particular user record from FxA.
>
> If all the services used the same technique, we could do cross-service
> activity correlation.
>
> I'd be interested in people's thoughts on the usefulness of this
> obfuscation.

Here follows a number of questions ... take them as a whole:

>From what we are looking at above (i.e. "How many ...?") questions,
then is it safe to assume we won't be asked for answers to questions
such as "Who has ...?". i.e. are we always going to respond with an
aggregated number such as 6,000,001 rather than a list of users? If we
do the Metrics ID then we can't answer the "Who has ...?" questions
anyway, so are we sure we won't need to provide these kinds of
answers? And if we are asked to provide such answers, should we even
allow that (based on protecting the users privacy)?

> > Of course, all services would need to know how to make that MetricsID if it
> > was logged at the edge, but if the uid was post-processed in the data
> > pipeline this could be done centrally.
>
> Yep.  If every service is able to do the uid -> metrics-id mapping at
> will, then does it really gain us anything?

Not really. I'm definitely a +1 on doing the metrics-id in post
processing so that each edge can just log uid as-is. I believe Heka
currently scrubs UIDs and emails from the fxa-auth-server logs so
converting to a metrics id and scrubbing the original uid seems
possible.

> I'd love for people to weigh in with their gut reactions here, even if
> you don't have any comments on the technical details.
>
> We will of course have to be in compliance with Mozilla's terms, privacy
> policy, etc when collecting all these metrics.  But IMHO saying "we're
> compliant with the posted ToS!" is not much help if what we're doing
> just feels wrong to people.

I think you're right about the 'if it just feels wrong' however, how
do we actually go about measuring it against the manifesto (et al)? Is
it just our gut feel which tells us if we're doing fine against it?

>
> So how can we make the gathering of these metrics feel as
> privacy-sensitive, as safe, as *right* as possible?

Apart from the Metrics ID, I don't have any other ideas at the moment.

Cheers,
Andy
_______________________________________________
Dev-fxacct mailing list
[email protected]
https://mail.mozilla.org/listinfo/dev-fxacct

Reply via email to