Repost, because filtering; there might be a point of confusion here that's
causing the problem. As I understand it, the user agent sanitisation is
expected to apply to EventLogging data, and data in the Analytics pipeline,
but not data streaming into MediaWiki proper - namely, the cu_changes
table. Nuria, is that the case?


On 27 March 2014 08:16, Nuria Ruiz <[email protected]> wrote:

> >Rather than having an ethical debate over it, we could always test the
> actual usefulness with Science. That way we'd be able to see how much
> granularity each additional component adds to the data.
> I kind of feel we are going backwards as we throughly discussed this
> point, technical info and references regarding entropy and user agents and
> fingerprinting can be found here:
> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
>
>
>
> On Thu, Mar 27, 2014 at 3:49 PM, Oliver Keyes <[email protected]>wrote:
>
>> +1. I'm totally down for keeping less information around, but if it gets
>> in the way of people doing their job?
>>
>> Rather than having an ethical debate over it, we could always test the
>> actual usefulness with Science. That way we'd be able to see how much
>> granularity each additional component adds to the data.
>>
>>
>> On 27 March 2014 07:15, Aaron Halfaker <[email protected]> wrote:
>>
>>> Including more information on the UA, while being covered by legal under
>>>> the new privacy policy, really goes agains the wishes of the community as
>>>> they do not wish to be finger printed.
>>>
>>>
>>> I don't think that "the wishes of the community" have been established
>>> and the whole point of checkuser is that it allows for fingerprinting.
>>>
>>>
>>>  On Thu, Mar 27, 2014 at 4:20 AM, Nuria Ruiz <[email protected]>wrote:
>>>
>>>>
>>>> >As a checkuser, user agents are an important part of my workflow for
>>>> identifying that multiple accounts are owned by the same person.
>>>> > So I'm going to have to argue for including more information in the
>>>> user agent.
>>>>
>>>>  Including more information on the UA, while being covered by legal
>>>> under the new privacy policy, really goes agains the wishes of the
>>>> community as they do not wish to be finger printed.
>>>> See:
>>>> https://www.mediawiki.org/wiki/Talk:EventLogging/UserAgentSanitizationor
>>>> https://meta.wikimedia.org/wiki/Talk:Privacy_policy
>>>> There has been plenty more discussions about this on analytics e-mail
>>>> list.
>>>>
>>>>
>>>> >Your proposed user agent would basically mean that every single
>>>> person using the most up-to-date version of the app on a particular
>>>> platform would >be indistinguishable from each other. This would,
>>>> unfortunately, lead to lots of innocent users getting blocked as
>>>> sockpuppets.
>>>>
>>>> However, note that the UA " WikipediaApp/<version>
>>>> <OS>/<form-factor>/<version>" clearly satisfies the use case of the mobile
>>>> team. It provides as much information as they need from their user without
>>>> sending any private data.
>>>>
>>>> Can you please list what is your use case? Namely how are you
>>>> identifying "false" accounts. Perhaps relying on the user agent to do so is
>>>> not the best strategy going forward. Have in mind that with the old privacy
>>>> policy UA data needed to be discarded after 90 days. With the new policy
>>>> there is more legal room but given community feedback analytics team is
>>>> planning on aggregating all UA information in the future. This means
>>>> that UA data will not be stored (or reported) per user or request but
>>>> rather agreggated (as in "4% of users use iPhone").
>>>>
>>>> We gathered recently information from all teams as to use cases
>>>> pertaining UA data collection:
>>>>
>>>> https://office.wikimedia.org/wiki/Analytics/Internal/EventLogging/PrivateData#Use_Cases_for_User_Agent_collection
>>>> .
>>>>
>>>> Let's talk about your use case and add it to the document that already
>>>> exists describing usages of user agent data, this document was sent out to
>>>> all teams couple months ago but there is no description of your use case
>>>> there:
>>>>
>>>> https://docs.google.com/a/wikimedia.org/document/d/1bp6qrvYi0Mh7l0s1psGnXEENWhmUfcKi1k1TbcozgeA/edit
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 26, 2014 at 11:20 PM, Dan Garry <[email protected]>wrote:
>>>>
>>>>> Hey Yuvi,
>>>>>
>>>>> As a checkuser, user agents are an important part of my workflow for
>>>>> identifying that multiple accounts are owned by the same person. So I'm
>>>>> going to have to argue for including more information in the user agent.
>>>>> Your proposed user agent would basically mean that every single person
>>>>> using the most up-to-date version of the app on a particular platform 
>>>>> would
>>>>> be indistinguishable from each other. This would, unfortunately, lead to
>>>>> lots of innocent users getting blocked as sockpuppets.
>>>>>
>>>>> Here's an example of a user agent from an iPhone using Safari:
>>>>> Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; zh-tw)
>>>>> AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8G4
>>>>> Safari/6533.18.5
>>>>>
>>>>> Look at all of that wonderful information! ;-) In general, the more
>>>>> information you can include without breaching the user's privacy, the
>>>>> better.
>>>>>
>>>>> I'd be happy to work with you on this.
>>>>>
>>>>> Thanks,
>>>>> Dan
>>>>>
>>>>> P.S. You may also want to consult with the legal team, to ensure that
>>>>> an unacceptable levels of private information are not given out. They 
>>>>> would
>>>>> also make a complement for me; I would likely be pulling in the direction
>>>>> of "MOAR INFORMATION!", whereas they would likely be pulling in the
>>>>> direction of "LESS INFORMATION!". :-)
>>>>>
>>>>>
>>>>> On 26 March 2014 15:00, Yuvi Panda <[email protected]> wrote:
>>>>>
>>>>>> Add Analytics to cc, as I think they'll be interested as well :)
>>>>>>
>>>>>> On Thu, Mar 27, 2014 at 3:20 AM, Yuvi Panda <[email protected]>
>>>>>> wrote:
>>>>>> > Hello!
>>>>>> >
>>>>>> > We are getting closer to a general release of the Wikipedia Android
>>>>>> > and iOS apps, and I think we should standardize on a User-Agent
>>>>>> > format. The old app just appended an identifier in front of the
>>>>>> > phone's default UA[1] but I think we can do better, to avoid privacy
>>>>>> > concerns[2].
>>>>>> >
>>>>>> > How about:
>>>>>> >
>>>>>> > WikipediaApp/<version> <OS>/<form-factor>/<version>
>>>>>> >
>>>>>> > This gives us all the info we need (App version, OS, Form Factor
>>>>>> > (Tablet / Phone) and OS version) without giving away too much. It is
>>>>>> > also fairly simple to construct and parse.
>>>>>> >
>>>>>> > For the latest alpha, my Nexus 4 would generate
>>>>>> >
>>>>>> > WikipediaApp/32 Android/Phone/4.4
>>>>>> >
>>>>>> > While an iOS device might generate
>>>>>> >
>>>>>> > WkipediaApp/2.0 iOS/Phone/7.1
>>>>>> >
>>>>>> > form-factor would just be Phone|Tablet for now, and can be expanded
>>>>>> > later if necessary.
>>>>>> >
>>>>>> > Thoughts?
>>>>>> >
>>>>>> > [1]: https://www.mediawiki.org/wiki/Mobile/User_agents#Apps
>>>>>> > [2]:
>>>>>> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
>>>>>> > --
>>>>>> > Yuvi Panda T
>>>>>> > http://yuvi.in/blog
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Yuvi Panda T
>>>>>> http://yuvi.in/blog
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dan Garry
>>>>> Associate Product Manager for Platform
>>>>> Wikimedia Foundation
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Mobile-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Reply via email to