Repost, because filtering; there might be a point of confusion here that's causing the problem. As I understand it, the user agent sanitisation is expected to apply to EventLogging data, and data in the Analytics pipeline, but not data streaming into MediaWiki proper - namely, the cu_changes table. Nuria, is that the case?
On 27 March 2014 08:16, Nuria Ruiz <[email protected]> wrote: > >Rather than having an ethical debate over it, we could always test the > actual usefulness with Science. That way we'd be able to see how much > granularity each additional component adds to the data. > I kind of feel we are going backwards as we throughly discussed this > point, technical info and references regarding entropy and user agents and > fingerprinting can be found here: > https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization > > > > On Thu, Mar 27, 2014 at 3:49 PM, Oliver Keyes <[email protected]>wrote: > >> +1. I'm totally down for keeping less information around, but if it gets >> in the way of people doing their job? >> >> Rather than having an ethical debate over it, we could always test the >> actual usefulness with Science. That way we'd be able to see how much >> granularity each additional component adds to the data. >> >> >> On 27 March 2014 07:15, Aaron Halfaker <[email protected]> wrote: >> >>> Including more information on the UA, while being covered by legal under >>>> the new privacy policy, really goes agains the wishes of the community as >>>> they do not wish to be finger printed. >>> >>> >>> I don't think that "the wishes of the community" have been established >>> and the whole point of checkuser is that it allows for fingerprinting. >>> >>> >>> On Thu, Mar 27, 2014 at 4:20 AM, Nuria Ruiz <[email protected]>wrote: >>> >>>> >>>> >As a checkuser, user agents are an important part of my workflow for >>>> identifying that multiple accounts are owned by the same person. >>>> > So I'm going to have to argue for including more information in the >>>> user agent. >>>> >>>> Including more information on the UA, while being covered by legal >>>> under the new privacy policy, really goes agains the wishes of the >>>> community as they do not wish to be finger printed. >>>> See: >>>> https://www.mediawiki.org/wiki/Talk:EventLogging/UserAgentSanitizationor >>>> https://meta.wikimedia.org/wiki/Talk:Privacy_policy >>>> There has been plenty more discussions about this on analytics e-mail >>>> list. >>>> >>>> >>>> >Your proposed user agent would basically mean that every single >>>> person using the most up-to-date version of the app on a particular >>>> platform would >be indistinguishable from each other. This would, >>>> unfortunately, lead to lots of innocent users getting blocked as >>>> sockpuppets. >>>> >>>> However, note that the UA " WikipediaApp/<version> >>>> <OS>/<form-factor>/<version>" clearly satisfies the use case of the mobile >>>> team. It provides as much information as they need from their user without >>>> sending any private data. >>>> >>>> Can you please list what is your use case? Namely how are you >>>> identifying "false" accounts. Perhaps relying on the user agent to do so is >>>> not the best strategy going forward. Have in mind that with the old privacy >>>> policy UA data needed to be discarded after 90 days. With the new policy >>>> there is more legal room but given community feedback analytics team is >>>> planning on aggregating all UA information in the future. This means >>>> that UA data will not be stored (or reported) per user or request but >>>> rather agreggated (as in "4% of users use iPhone"). >>>> >>>> We gathered recently information from all teams as to use cases >>>> pertaining UA data collection: >>>> >>>> https://office.wikimedia.org/wiki/Analytics/Internal/EventLogging/PrivateData#Use_Cases_for_User_Agent_collection >>>> . >>>> >>>> Let's talk about your use case and add it to the document that already >>>> exists describing usages of user agent data, this document was sent out to >>>> all teams couple months ago but there is no description of your use case >>>> there: >>>> >>>> https://docs.google.com/a/wikimedia.org/document/d/1bp6qrvYi0Mh7l0s1psGnXEENWhmUfcKi1k1TbcozgeA/edit >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Mar 26, 2014 at 11:20 PM, Dan Garry <[email protected]>wrote: >>>> >>>>> Hey Yuvi, >>>>> >>>>> As a checkuser, user agents are an important part of my workflow for >>>>> identifying that multiple accounts are owned by the same person. So I'm >>>>> going to have to argue for including more information in the user agent. >>>>> Your proposed user agent would basically mean that every single person >>>>> using the most up-to-date version of the app on a particular platform >>>>> would >>>>> be indistinguishable from each other. This would, unfortunately, lead to >>>>> lots of innocent users getting blocked as sockpuppets. >>>>> >>>>> Here's an example of a user agent from an iPhone using Safari: >>>>> Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; zh-tw) >>>>> AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8G4 >>>>> Safari/6533.18.5 >>>>> >>>>> Look at all of that wonderful information! ;-) In general, the more >>>>> information you can include without breaching the user's privacy, the >>>>> better. >>>>> >>>>> I'd be happy to work with you on this. >>>>> >>>>> Thanks, >>>>> Dan >>>>> >>>>> P.S. You may also want to consult with the legal team, to ensure that >>>>> an unacceptable levels of private information are not given out. They >>>>> would >>>>> also make a complement for me; I would likely be pulling in the direction >>>>> of "MOAR INFORMATION!", whereas they would likely be pulling in the >>>>> direction of "LESS INFORMATION!". :-) >>>>> >>>>> >>>>> On 26 March 2014 15:00, Yuvi Panda <[email protected]> wrote: >>>>> >>>>>> Add Analytics to cc, as I think they'll be interested as well :) >>>>>> >>>>>> On Thu, Mar 27, 2014 at 3:20 AM, Yuvi Panda <[email protected]> >>>>>> wrote: >>>>>> > Hello! >>>>>> > >>>>>> > We are getting closer to a general release of the Wikipedia Android >>>>>> > and iOS apps, and I think we should standardize on a User-Agent >>>>>> > format. The old app just appended an identifier in front of the >>>>>> > phone's default UA[1] but I think we can do better, to avoid privacy >>>>>> > concerns[2]. >>>>>> > >>>>>> > How about: >>>>>> > >>>>>> > WikipediaApp/<version> <OS>/<form-factor>/<version> >>>>>> > >>>>>> > This gives us all the info we need (App version, OS, Form Factor >>>>>> > (Tablet / Phone) and OS version) without giving away too much. It is >>>>>> > also fairly simple to construct and parse. >>>>>> > >>>>>> > For the latest alpha, my Nexus 4 would generate >>>>>> > >>>>>> > WikipediaApp/32 Android/Phone/4.4 >>>>>> > >>>>>> > While an iOS device might generate >>>>>> > >>>>>> > WkipediaApp/2.0 iOS/Phone/7.1 >>>>>> > >>>>>> > form-factor would just be Phone|Tablet for now, and can be expanded >>>>>> > later if necessary. >>>>>> > >>>>>> > Thoughts? >>>>>> > >>>>>> > [1]: https://www.mediawiki.org/wiki/Mobile/User_agents#Apps >>>>>> > [2]: >>>>>> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization >>>>>> > -- >>>>>> > Yuvi Panda T >>>>>> > http://yuvi.in/blog >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Yuvi Panda T >>>>>> http://yuvi.in/blog >>>>>> >>>>>> _______________________________________________ >>>>>> Analytics mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Dan Garry >>>>> Associate Product Manager for Platform >>>>> Wikimedia Foundation >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> >> -- >> Oliver Keyes >> Research Analyst >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Mobile-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mobile-l
