Hey Andrew, that’s a great question. I asked Legal to review the implications of publicly releasing a snapshot of this data and I’ll post the outcome of the audit on this list. FWIW the data in question will be aggregated from the logs of raw HTTP request that WMF passively receives. This is the same type of data we previously used for the presentation on readership trends the Analytics Team gave at Monthly Metrics in December [1] The format of the logs and the data they contain is described here [2]
Personally identifiable information (such as IP addresses or User Agents) will not be used other than for the purpose of filtering bots and automated requests: clickthrough data will be obtained by parsing and counting specific string occurrences (such as an article title) in the referer string of an HTTP request. In other words, we will be counting and aggregating occurrences of requests for article B having article A as a string in the referral. I’ll work with Ellery to release the code of the log parsing script so it can be publicly reviewed before we move forward. Hope this addresses your concerns, Dario [1] https://meta.wikimedia.org/w/index.php?title=File:2014_Readership_Update,_WMF_Metrics_Meeting,_December.pdf&page=10 <https://meta.wikimedia.org/w/index.php?title=File:2014_Readership_Update,_WMF_Metrics_Meeting,_December.pdf&page=10> [2] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive <https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive> > On Jan 12, 2015, at 1:27 PM, Andrew Gray <[email protected]> wrote: > > Hi all, > > I'm curious about the privacy implications as well. I can't think of > specific problems with this data, *but* it's information that I didn't > think we'd ever been logging. We've historically been quite hands-off > with any kind of reader information, other than raw hit counts, and > there might well be some community discomfort at discovering it's been > both tracked and released, even if completely anonymised. > > Andrew. > > On 12 January 2015 at 20:08, Toby Negrin <[email protected]> wrote: >> Thanks Amir -- feel free to have your friend reach out to this list >> directly. >> >> As Ellery said, we're figuring our if there are any privacy implications in >> releasing this dataset. >> >> -Toby >> >> On Mon, Jan 12, 2015 at 12:05 PM, Amir E. Aharoni >> <[email protected]> wrote: >>> >>> I am asking for a real-life friend who is doing some research. It's not >>> for any particular project of mine, but I can easily imagine that it can be >>> useful for a lot of editors and product managers as I wrote in the opening >>> post. >>> >>> (And I cannot think of any privacy problems if the data is not tied to any >>> particular people, but maybe I'm naive.) >>> >>> >>> -- >>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי >>> http://aharoni.wordpress.com >>> “We're living in pieces, >>> I want to live in peace.” – T. Moore >>> >>> 2015-01-12 22:00 GMT+02:00 Toby Negrin <[email protected]>: >>>> >>>> Hi Amir -- >>>> >>>> Would you like to see these datasets released publicly or was there a >>>> specific project you were interested in using them for? >>>> >>>> thanks, >>>> >>>> -Toby >>>> >>>> On Mon, Jan 12, 2015 at 5:44 AM, Amir E. Aharoni >>>> <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Are there metrics about which links in each article are the most >>>>> clicked? >>>>> >>>>> I can think there's a lot to be learned from it: >>>>> * Data-driven suggestions for manual of style about linking (too much >>>>> and too few links are a perennial topic of argument) >>>>> * How do people traverse between topics. >>>>> * Which terms in the article may need a short explanation in parentheses >>>>> rather than just a link. >>>>> * How far down into the article do people bother to read. >>>>> >>>>> Anyway, I can think that accessibility to such data can optimize both >>>>> readership and editing. >>>>> >>>>> And maybe this can be just taken right from the logs, without any >>>>> additional EventLogging. >>>>> >>>>> -- >>>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי >>>>> http://aharoni.wordpress.com >>>>> “We're living in pieces, >>>>> I want to live in peace.” – T. Moore >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > - Andrew Gray > [email protected] > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
