Hey folks -- we aren't considering changing any of the data that goes into
checkuser. That tool will be unchanged.

This discussion only concerns backend logging EventLogging and page view
analytics.

thanks,

-Toby


On Thu, Mar 27, 2014 at 3:30 PM, Dan Garry <[email protected]> wrote:

> Note: I speak in this thread as a volunteer checkuser, not as the product
> manager for platform. Not sending from my volunteer email address because I
> don't want to subscribe to this list on two separate email addresses. :-)
>
> The originally proposed spec (user agent to include device type and app
> version) would have been very disruptive to the workflow of checkusers,
> which relies in part on user agent data. Your proposed update (including a
> client-specific identifier with the user agent for CU), is sensible, and
> lets checkuser do their jobs of dealing with people abusing our sites
> without unnecessarily divulging personally identifying information to the
> checkusers.
>
> If any interested parties have improvements to propose, let's hear them!
>
> Thanks,
> Dan
>
>
>
>
>
>
> On 27 March 2014 14:30, James Alexander <[email protected]> wrote:
>
>> Speaking in my capacity as both a long term volunteer checkuser (though
>> not currently because of my work requirements) a very active work related
>> owner/user of checkuser in the LCA team [probably the most active within
>> staff], and a strong advocate for saving as little info as possible I think
>> your proposed adjustment makes sense.
>>
>> That (assuming it's done for all logged actions as you suggest) seems
>> like it would fit in to the CU requirements well while saving as little
>> information as needed on readers.
>>
>> You say that the 2nd 'edit' user agent will be sent as a separate header,
>> I imagine that would still be recorded in the read logs then, is it just
>> that it wouldn't be saved long term after the logs are processed in some
>> way to remove other headers? [That would make sense to me, but if it's
>> going to be kept in the logs as long as the user agent in the first place I
>> don't know why we wouldn't just switch was was being sent 'as' the user
>> agent].
>>
>> James
>>
>> James Alexander
>> Legal and Community Advocacy
>> Wikimedia Foundation
>> (415) 839-6885 x6716 @jamesofur
>>
>>
>> On Thu, Mar 27, 2014 at 1:45 PM, Yuvi Panda <[email protected]> wrote:
>>
>>> Forking since I think there are two conversations - one about the
>>> format of UA for the mobile apps and one about CheckUser requirements
>>> for anything that does edits. Having them separate would be useful.
>>>
>>> For those who do not know what CheckUser means, I recommend reading
>>> https://en.wikipedia.org/wiki/Wikipedia:CheckUser.
>>>
>>> IP address and UA are amongst the two most important pieces of info
>>> CUs have in helping prevent abuse. IP is already sortof useless with
>>> mobile networks - a lot of providers do NAT and similar things that
>>> mean that we can not remotely close to reasonably assume 1 IP = 1
>>> User, or anything remotely similar to that. UA provides more
>>> fingerprinting ability, but CU isn't the only thing that consumes UA -
>>> other parts of the infrastructure do as well.
>>>
>>> So what we need, is a way to preserve the ability to fingerprint only
>>> users making edits (no read actions!) for CU. I am sure that can be
>>> implemented without having to have a very fingerprintable UA, with
>>> simple hooks on both the App's side and on Extension:CheckUser.
>>>
>>> We could generate a simple fingerprint that's unique per device (and
>>> disconnected completely from every other device identifier) that we
>>> send only with edits (and other 'POST' actions) as a separate header.
>>> This can be processed by CU (perhaps with a hook that
>>> Extension:MobileApp can hook into) and then used by CheckUsers. This
>>> data will be treated with the same data retention / privacy policy
>>> that applies to CUs now, and regular UA data can be consumed by other
>>> consumers without too much fingerprinting concerns.
>>>
>>> I talked to hoo and he said the CU hook shouldn't be too much of a
>>> problem, and the app side of the issue is rather simple too. Deskana
>>> (speaking solely as a volunteer CU) says that this solution is
>>> acceptable to him. Thoughts other people?
>>>
>>> On Thu, Mar 27, 2014 at 10:43 PM, Oliver Keyes <[email protected]>
>>> wrote:
>>> > Repost, because filtering; there might be a point of confusion here
>>> that's
>>> > causing the problem. As I understand it, the user agent sanitisation is
>>> > expected to apply to EventLogging data, and data in the Analytics
>>> pipeline,
>>> > but not data streaming into MediaWiki proper - namely, the cu_changes
>>> table.
>>> > Nuria, is that the case?
>>> >
>>> >
>>> > On 27 March 2014 08:16, Nuria Ruiz <[email protected]> wrote:
>>> >>
>>> >> >Rather than having an ethical debate over it, we could always test
>>> the
>>> >> > actual usefulness with Science. That way we'd be able to see how
>>> much
>>> >> > granularity each additional component adds to the data.
>>> >> I kind of feel we are going backwards as we throughly discussed this
>>> >> point, technical info and references regarding entropy and user
>>> agents and
>>> >> fingerprinting can be found here:
>>> >> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Mar 27, 2014 at 3:49 PM, Oliver Keyes <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> +1. I'm totally down for keeping less information around, but if it
>>> gets
>>> >>> in the way of people doing their job?
>>> >>>
>>> >>> Rather than having an ethical debate over it, we could always test
>>> the
>>> >>> actual usefulness with Science. That way we'd be able to see how much
>>> >>> granularity each additional component adds to the data.
>>> >>>
>>> >>>
>>> >>> On 27 March 2014 07:15, Aaron Halfaker <[email protected]>
>>> wrote:
>>> >>>>>
>>> >>>>> Including more information on the UA, while being covered by legal
>>> >>>>> under the new privacy policy, really goes agains the wishes of the
>>> community
>>> >>>>> as they do not wish to be finger printed.
>>> >>>>
>>> >>>>
>>> >>>> I don't think that "the wishes of the community" have been
>>> established
>>> >>>> and the whole point of checkuser is that it allows for
>>> fingerprinting.
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Mar 27, 2014 at 4:20 AM, Nuria Ruiz <[email protected]>
>>> wrote:
>>> >>>>>
>>> >>>>>
>>> >>>>> >As a checkuser, user agents are an important part of my workflow
>>> for
>>> >>>>> > identifying that multiple accounts are owned by the same person.
>>> >>>>> > So I'm going to have to argue for including more information in
>>> the
>>> >>>>> > user agent.
>>> >>>>>
>>> >>>>> Including more information on the UA, while being covered by legal
>>> >>>>> under the new privacy policy, really goes agains the wishes of the
>>> community
>>> >>>>> as they do not wish to be finger printed.
>>> >>>>> See:
>>> >>>>>
>>> https://www.mediawiki.org/wiki/Talk:EventLogging/UserAgentSanitizationor
>>> >>>>> https://meta.wikimedia.org/wiki/Talk:Privacy_policy
>>> >>>>> There has been plenty more discussions about this on analytics
>>> e-mail
>>> >>>>> list.
>>> >>>>>
>>> >>>>>
>>> >>>>> >Your proposed user agent would basically mean that every single
>>> person
>>> >>>>> > using the most up-to-date version of the app on a particular
>>> platform would
>>> >>>>> > >be indistinguishable from each other. This would,
>>> unfortunately, lead to
>>> >>>>> > lots of innocent users getting blocked as sockpuppets.
>>> >>>>>
>>> >>>>> However, note that the UA " WikipediaApp/<version>
>>> >>>>> <OS>/<form-factor>/<version>" clearly satisfies the use case of
>>> the mobile
>>> >>>>> team. It provides as much information as they need from their user
>>> without
>>> >>>>> sending any private data.
>>> >>>>>
>>> >>>>> Can you please list what is your use case? Namely how are you
>>> >>>>> identifying "false" accounts. Perhaps relying on the user agent to
>>> do so is
>>> >>>>> not the best strategy going forward. Have in mind that with the
>>> old privacy
>>> >>>>> policy UA data needed to be discarded after 90 days. With the new
>>> policy
>>> >>>>> there is more legal room but given community feedback analytics
>>> team is
>>> >>>>> planning on aggregating all UA information in the future. This
>>> means that UA
>>> >>>>> data will not be stored (or reported) per user or request but
>>> rather
>>> >>>>> agreggated (as in "4% of users use iPhone").
>>> >>>>>
>>> >>>>> We gathered recently information from all teams as to use cases
>>> >>>>> pertaining UA data collection:
>>> >>>>>
>>> >>>>>
>>> https://office.wikimedia.org/wiki/Analytics/Internal/EventLogging/PrivateData#Use_Cases_for_User_Agent_collection
>>> .
>>> >>>>>
>>> >>>>> Let's talk about your use case and add it to the document that
>>> already
>>> >>>>> exists describing usages of user agent data, this document was
>>> sent out to
>>> >>>>> all teams couple months ago but there is no description of your
>>> use case
>>> >>>>> there:
>>> >>>>>
>>> >>>>>
>>> https://docs.google.com/a/wikimedia.org/document/d/1bp6qrvYi0Mh7l0s1psGnXEENWhmUfcKi1k1TbcozgeA/edit
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Wed, Mar 26, 2014 at 11:20 PM, Dan Garry <[email protected]>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> Hey Yuvi,
>>> >>>>>>
>>> >>>>>> As a checkuser, user agents are an important part of my workflow
>>> for
>>> >>>>>> identifying that multiple accounts are owned by the same person.
>>> So I'm
>>> >>>>>> going to have to argue for including more information in the user
>>> agent.
>>> >>>>>> Your proposed user agent would basically mean that every single
>>> person using
>>> >>>>>> the most up-to-date version of the app on a particular platform
>>> would be
>>> >>>>>> indistinguishable from each other. This would, unfortunately,
>>> lead to lots
>>> >>>>>> of innocent users getting blocked as sockpuppets.
>>> >>>>>>
>>> >>>>>> Here's an example of a user agent from an iPhone using Safari:
>>> >>>>>> Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; zh-tw)
>>> >>>>>> AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8G4
>>> >>>>>> Safari/6533.18.5
>>> >>>>>>
>>> >>>>>> Look at all of that wonderful information! ;-) In general, the
>>> more
>>> >>>>>> information you can include without breaching the user's privacy,
>>> the
>>> >>>>>> better.
>>> >>>>>>
>>> >>>>>> I'd be happy to work with you on this.
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Dan
>>> >>>>>>
>>> >>>>>> P.S. You may also want to consult with the legal team, to ensure
>>> that
>>> >>>>>> an unacceptable levels of private information are not given out.
>>> They would
>>> >>>>>> also make a complement for me; I would likely be pulling in the
>>> direction of
>>> >>>>>> "MOAR INFORMATION!", whereas they would likely be pulling in the
>>> direction
>>> >>>>>> of "LESS INFORMATION!". :-)
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On 26 March 2014 15:00, Yuvi Panda <[email protected]> wrote:
>>> >>>>>>>
>>> >>>>>>> Add Analytics to cc, as I think they'll be interested as well :)
>>> >>>>>>>
>>> >>>>>>> On Thu, Mar 27, 2014 at 3:20 AM, Yuvi Panda <[email protected]
>>> >
>>> >>>>>>> wrote:
>>> >>>>>>> > Hello!
>>> >>>>>>> >
>>> >>>>>>> > We are getting closer to a general release of the Wikipedia
>>> Android
>>> >>>>>>> > and iOS apps, and I think we should standardize on a User-Agent
>>> >>>>>>> > format. The old app just appended an identifier in front of the
>>> >>>>>>> > phone's default UA[1] but I think we can do better, to avoid
>>> >>>>>>> > privacy
>>> >>>>>>> > concerns[2].
>>> >>>>>>> >
>>> >>>>>>> > How about:
>>> >>>>>>> >
>>> >>>>>>> > WikipediaApp/<version> <OS>/<form-factor>/<version>
>>> >>>>>>> >
>>> >>>>>>> > This gives us all the info we need (App version, OS, Form
>>> Factor
>>> >>>>>>> > (Tablet / Phone) and OS version) without giving away too much.
>>> It
>>> >>>>>>> > is
>>> >>>>>>> > also fairly simple to construct and parse.
>>> >>>>>>> >
>>> >>>>>>> > For the latest alpha, my Nexus 4 would generate
>>> >>>>>>> >
>>> >>>>>>> > WikipediaApp/32 Android/Phone/4.4
>>> >>>>>>> >
>>> >>>>>>> > While an iOS device might generate
>>> >>>>>>> >
>>> >>>>>>> > WkipediaApp/2.0 iOS/Phone/7.1
>>> >>>>>>> >
>>> >>>>>>> > form-factor would just be Phone|Tablet for now, and can be
>>> expanded
>>> >>>>>>> > later if necessary.
>>> >>>>>>> >
>>> >>>>>>> > Thoughts?
>>> >>>>>>> >
>>> >>>>>>> > [1]: https://www.mediawiki.org/wiki/Mobile/User_agents#Apps
>>> >>>>>>> > [2]:
>>> >>>>>>> >
>>> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
>>> >>>>>>> > --
>>> >>>>>>> > Yuvi Panda T
>>> >>>>>>> > http://yuvi.in/blog
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Yuvi Panda T
>>> >>>>>>> http://yuvi.in/blog
>>> >>>>>>>
>>> >>>>>>> _______________________________________________
>>> >>>>>>> Analytics mailing list
>>> >>>>>>> [email protected]
>>> >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> Dan Garry
>>> >>>>>> Associate Product Manager for Platform
>>> >>>>>> Wikimedia Foundation
>>> >>>>>>
>>> >>>>>> _______________________________________________
>>> >>>>>> Analytics mailing list
>>> >>>>>> [email protected]
>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> Analytics mailing list
>>> >>>>> [email protected]
>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> _______________________________________________
>>> >>>> Analytics mailing list
>>> >>>> [email protected]
>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Oliver Keyes
>>> >>> Research Analyst
>>> >>> Wikimedia Foundation
>>> >>>
>>> >>> _______________________________________________
>>> >>> Analytics mailing list
>>> >>> [email protected]
>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Analytics mailing list
>>> >> [email protected]
>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Oliver Keyes
>>> > Research Analyst
>>> > Wikimedia Foundation
>>>
>>>
>>>
>>> --
>>> Yuvi Panda T
>>> http://yuvi.in/blog
>>>
>>> _______________________________________________
>>> Mobile-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Dan Garry
> Associate Product Manager for Platform
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Mobile-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Reply via email to