Hey folks -- we aren't considering changing any of the data that goes into checkuser. That tool will be unchanged.
This discussion only concerns backend logging EventLogging and page view analytics. thanks, -Toby On Thu, Mar 27, 2014 at 3:30 PM, Dan Garry <[email protected]> wrote: > Note: I speak in this thread as a volunteer checkuser, not as the product > manager for platform. Not sending from my volunteer email address because I > don't want to subscribe to this list on two separate email addresses. :-) > > The originally proposed spec (user agent to include device type and app > version) would have been very disruptive to the workflow of checkusers, > which relies in part on user agent data. Your proposed update (including a > client-specific identifier with the user agent for CU), is sensible, and > lets checkuser do their jobs of dealing with people abusing our sites > without unnecessarily divulging personally identifying information to the > checkusers. > > If any interested parties have improvements to propose, let's hear them! > > Thanks, > Dan > > > > > > > On 27 March 2014 14:30, James Alexander <[email protected]> wrote: > >> Speaking in my capacity as both a long term volunteer checkuser (though >> not currently because of my work requirements) a very active work related >> owner/user of checkuser in the LCA team [probably the most active within >> staff], and a strong advocate for saving as little info as possible I think >> your proposed adjustment makes sense. >> >> That (assuming it's done for all logged actions as you suggest) seems >> like it would fit in to the CU requirements well while saving as little >> information as needed on readers. >> >> You say that the 2nd 'edit' user agent will be sent as a separate header, >> I imagine that would still be recorded in the read logs then, is it just >> that it wouldn't be saved long term after the logs are processed in some >> way to remove other headers? [That would make sense to me, but if it's >> going to be kept in the logs as long as the user agent in the first place I >> don't know why we wouldn't just switch was was being sent 'as' the user >> agent]. >> >> James >> >> James Alexander >> Legal and Community Advocacy >> Wikimedia Foundation >> (415) 839-6885 x6716 @jamesofur >> >> >> On Thu, Mar 27, 2014 at 1:45 PM, Yuvi Panda <[email protected]> wrote: >> >>> Forking since I think there are two conversations - one about the >>> format of UA for the mobile apps and one about CheckUser requirements >>> for anything that does edits. Having them separate would be useful. >>> >>> For those who do not know what CheckUser means, I recommend reading >>> https://en.wikipedia.org/wiki/Wikipedia:CheckUser. >>> >>> IP address and UA are amongst the two most important pieces of info >>> CUs have in helping prevent abuse. IP is already sortof useless with >>> mobile networks - a lot of providers do NAT and similar things that >>> mean that we can not remotely close to reasonably assume 1 IP = 1 >>> User, or anything remotely similar to that. UA provides more >>> fingerprinting ability, but CU isn't the only thing that consumes UA - >>> other parts of the infrastructure do as well. >>> >>> So what we need, is a way to preserve the ability to fingerprint only >>> users making edits (no read actions!) for CU. I am sure that can be >>> implemented without having to have a very fingerprintable UA, with >>> simple hooks on both the App's side and on Extension:CheckUser. >>> >>> We could generate a simple fingerprint that's unique per device (and >>> disconnected completely from every other device identifier) that we >>> send only with edits (and other 'POST' actions) as a separate header. >>> This can be processed by CU (perhaps with a hook that >>> Extension:MobileApp can hook into) and then used by CheckUsers. This >>> data will be treated with the same data retention / privacy policy >>> that applies to CUs now, and regular UA data can be consumed by other >>> consumers without too much fingerprinting concerns. >>> >>> I talked to hoo and he said the CU hook shouldn't be too much of a >>> problem, and the app side of the issue is rather simple too. Deskana >>> (speaking solely as a volunteer CU) says that this solution is >>> acceptable to him. Thoughts other people? >>> >>> On Thu, Mar 27, 2014 at 10:43 PM, Oliver Keyes <[email protected]> >>> wrote: >>> > Repost, because filtering; there might be a point of confusion here >>> that's >>> > causing the problem. As I understand it, the user agent sanitisation is >>> > expected to apply to EventLogging data, and data in the Analytics >>> pipeline, >>> > but not data streaming into MediaWiki proper - namely, the cu_changes >>> table. >>> > Nuria, is that the case? >>> > >>> > >>> > On 27 March 2014 08:16, Nuria Ruiz <[email protected]> wrote: >>> >> >>> >> >Rather than having an ethical debate over it, we could always test >>> the >>> >> > actual usefulness with Science. That way we'd be able to see how >>> much >>> >> > granularity each additional component adds to the data. >>> >> I kind of feel we are going backwards as we throughly discussed this >>> >> point, technical info and references regarding entropy and user >>> agents and >>> >> fingerprinting can be found here: >>> >> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization >>> >> >>> >> >>> >> >>> >> On Thu, Mar 27, 2014 at 3:49 PM, Oliver Keyes <[email protected]> >>> >> wrote: >>> >>> >>> >>> +1. I'm totally down for keeping less information around, but if it >>> gets >>> >>> in the way of people doing their job? >>> >>> >>> >>> Rather than having an ethical debate over it, we could always test >>> the >>> >>> actual usefulness with Science. That way we'd be able to see how much >>> >>> granularity each additional component adds to the data. >>> >>> >>> >>> >>> >>> On 27 March 2014 07:15, Aaron Halfaker <[email protected]> >>> wrote: >>> >>>>> >>> >>>>> Including more information on the UA, while being covered by legal >>> >>>>> under the new privacy policy, really goes agains the wishes of the >>> community >>> >>>>> as they do not wish to be finger printed. >>> >>>> >>> >>>> >>> >>>> I don't think that "the wishes of the community" have been >>> established >>> >>>> and the whole point of checkuser is that it allows for >>> fingerprinting. >>> >>>> >>> >>>> >>> >>>> On Thu, Mar 27, 2014 at 4:20 AM, Nuria Ruiz <[email protected]> >>> wrote: >>> >>>>> >>> >>>>> >>> >>>>> >As a checkuser, user agents are an important part of my workflow >>> for >>> >>>>> > identifying that multiple accounts are owned by the same person. >>> >>>>> > So I'm going to have to argue for including more information in >>> the >>> >>>>> > user agent. >>> >>>>> >>> >>>>> Including more information on the UA, while being covered by legal >>> >>>>> under the new privacy policy, really goes agains the wishes of the >>> community >>> >>>>> as they do not wish to be finger printed. >>> >>>>> See: >>> >>>>> >>> https://www.mediawiki.org/wiki/Talk:EventLogging/UserAgentSanitizationor >>> >>>>> https://meta.wikimedia.org/wiki/Talk:Privacy_policy >>> >>>>> There has been plenty more discussions about this on analytics >>> e-mail >>> >>>>> list. >>> >>>>> >>> >>>>> >>> >>>>> >Your proposed user agent would basically mean that every single >>> person >>> >>>>> > using the most up-to-date version of the app on a particular >>> platform would >>> >>>>> > >be indistinguishable from each other. This would, >>> unfortunately, lead to >>> >>>>> > lots of innocent users getting blocked as sockpuppets. >>> >>>>> >>> >>>>> However, note that the UA " WikipediaApp/<version> >>> >>>>> <OS>/<form-factor>/<version>" clearly satisfies the use case of >>> the mobile >>> >>>>> team. It provides as much information as they need from their user >>> without >>> >>>>> sending any private data. >>> >>>>> >>> >>>>> Can you please list what is your use case? Namely how are you >>> >>>>> identifying "false" accounts. Perhaps relying on the user agent to >>> do so is >>> >>>>> not the best strategy going forward. Have in mind that with the >>> old privacy >>> >>>>> policy UA data needed to be discarded after 90 days. With the new >>> policy >>> >>>>> there is more legal room but given community feedback analytics >>> team is >>> >>>>> planning on aggregating all UA information in the future. This >>> means that UA >>> >>>>> data will not be stored (or reported) per user or request but >>> rather >>> >>>>> agreggated (as in "4% of users use iPhone"). >>> >>>>> >>> >>>>> We gathered recently information from all teams as to use cases >>> >>>>> pertaining UA data collection: >>> >>>>> >>> >>>>> >>> https://office.wikimedia.org/wiki/Analytics/Internal/EventLogging/PrivateData#Use_Cases_for_User_Agent_collection >>> . >>> >>>>> >>> >>>>> Let's talk about your use case and add it to the document that >>> already >>> >>>>> exists describing usages of user agent data, this document was >>> sent out to >>> >>>>> all teams couple months ago but there is no description of your >>> use case >>> >>>>> there: >>> >>>>> >>> >>>>> >>> https://docs.google.com/a/wikimedia.org/document/d/1bp6qrvYi0Mh7l0s1psGnXEENWhmUfcKi1k1TbcozgeA/edit >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On Wed, Mar 26, 2014 at 11:20 PM, Dan Garry <[email protected]> >>> >>>>> wrote: >>> >>>>>> >>> >>>>>> Hey Yuvi, >>> >>>>>> >>> >>>>>> As a checkuser, user agents are an important part of my workflow >>> for >>> >>>>>> identifying that multiple accounts are owned by the same person. >>> So I'm >>> >>>>>> going to have to argue for including more information in the user >>> agent. >>> >>>>>> Your proposed user agent would basically mean that every single >>> person using >>> >>>>>> the most up-to-date version of the app on a particular platform >>> would be >>> >>>>>> indistinguishable from each other. This would, unfortunately, >>> lead to lots >>> >>>>>> of innocent users getting blocked as sockpuppets. >>> >>>>>> >>> >>>>>> Here's an example of a user agent from an iPhone using Safari: >>> >>>>>> Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; zh-tw) >>> >>>>>> AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8G4 >>> >>>>>> Safari/6533.18.5 >>> >>>>>> >>> >>>>>> Look at all of that wonderful information! ;-) In general, the >>> more >>> >>>>>> information you can include without breaching the user's privacy, >>> the >>> >>>>>> better. >>> >>>>>> >>> >>>>>> I'd be happy to work with you on this. >>> >>>>>> >>> >>>>>> Thanks, >>> >>>>>> Dan >>> >>>>>> >>> >>>>>> P.S. You may also want to consult with the legal team, to ensure >>> that >>> >>>>>> an unacceptable levels of private information are not given out. >>> They would >>> >>>>>> also make a complement for me; I would likely be pulling in the >>> direction of >>> >>>>>> "MOAR INFORMATION!", whereas they would likely be pulling in the >>> direction >>> >>>>>> of "LESS INFORMATION!". :-) >>> >>>>>> >>> >>>>>> >>> >>>>>> On 26 March 2014 15:00, Yuvi Panda <[email protected]> wrote: >>> >>>>>>> >>> >>>>>>> Add Analytics to cc, as I think they'll be interested as well :) >>> >>>>>>> >>> >>>>>>> On Thu, Mar 27, 2014 at 3:20 AM, Yuvi Panda <[email protected] >>> > >>> >>>>>>> wrote: >>> >>>>>>> > Hello! >>> >>>>>>> > >>> >>>>>>> > We are getting closer to a general release of the Wikipedia >>> Android >>> >>>>>>> > and iOS apps, and I think we should standardize on a User-Agent >>> >>>>>>> > format. The old app just appended an identifier in front of the >>> >>>>>>> > phone's default UA[1] but I think we can do better, to avoid >>> >>>>>>> > privacy >>> >>>>>>> > concerns[2]. >>> >>>>>>> > >>> >>>>>>> > How about: >>> >>>>>>> > >>> >>>>>>> > WikipediaApp/<version> <OS>/<form-factor>/<version> >>> >>>>>>> > >>> >>>>>>> > This gives us all the info we need (App version, OS, Form >>> Factor >>> >>>>>>> > (Tablet / Phone) and OS version) without giving away too much. >>> It >>> >>>>>>> > is >>> >>>>>>> > also fairly simple to construct and parse. >>> >>>>>>> > >>> >>>>>>> > For the latest alpha, my Nexus 4 would generate >>> >>>>>>> > >>> >>>>>>> > WikipediaApp/32 Android/Phone/4.4 >>> >>>>>>> > >>> >>>>>>> > While an iOS device might generate >>> >>>>>>> > >>> >>>>>>> > WkipediaApp/2.0 iOS/Phone/7.1 >>> >>>>>>> > >>> >>>>>>> > form-factor would just be Phone|Tablet for now, and can be >>> expanded >>> >>>>>>> > later if necessary. >>> >>>>>>> > >>> >>>>>>> > Thoughts? >>> >>>>>>> > >>> >>>>>>> > [1]: https://www.mediawiki.org/wiki/Mobile/User_agents#Apps >>> >>>>>>> > [2]: >>> >>>>>>> > >>> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization >>> >>>>>>> > -- >>> >>>>>>> > Yuvi Panda T >>> >>>>>>> > http://yuvi.in/blog >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> Yuvi Panda T >>> >>>>>>> http://yuvi.in/blog >>> >>>>>>> >>> >>>>>>> _______________________________________________ >>> >>>>>>> Analytics mailing list >>> >>>>>>> [email protected] >>> >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Dan Garry >>> >>>>>> Associate Product Manager for Platform >>> >>>>>> Wikimedia Foundation >>> >>>>>> >>> >>>>>> _______________________________________________ >>> >>>>>> Analytics mailing list >>> >>>>>> [email protected] >>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Analytics mailing list >>> >>>>> [email protected] >>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>> >>>> >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Analytics mailing list >>> >>>> [email protected] >>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Oliver Keyes >>> >>> Research Analyst >>> >>> Wikimedia Foundation >>> >>> >>> >>> _______________________________________________ >>> >>> Analytics mailing list >>> >>> [email protected] >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >>> >> >>> >> >>> >> _______________________________________________ >>> >> Analytics mailing list >>> >> [email protected] >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >>> > >>> > >>> > >>> > -- >>> > Oliver Keyes >>> > Research Analyst >>> > Wikimedia Foundation >>> >>> >>> >>> -- >>> Yuvi Panda T >>> http://yuvi.in/blog >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > > -- > Dan Garry > Associate Product Manager for Platform > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Mobile-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mobile-l
