Hey Andrew,

that’s a great question. I asked Legal to review the implications of publicly 
releasing a snapshot of this data and I’ll post the outcome of the audit on 
this list. FWIW the data in question will be aggregated from the logs of raw 
HTTP request that WMF passively receives. This is the same type of data we 
previously used for the presentation on readership trends the Analytics Team 
gave at Monthly Metrics in December [1] The format of the logs and the data 
they contain is described here [2]

Personally identifiable information (such as IP addresses or User Agents) will 
not be used other than for the purpose of filtering bots and automated 
requests: clickthrough data will be obtained by parsing and counting specific 
string occurrences (such as an article title) in the referer string of an HTTP 
request. In other words, we will be counting and aggregating occurrences of 
requests for article B having article A as a string in the referral. I’ll work 
with Ellery to release the code of the log parsing script so it can be publicly 
reviewed before we move forward.

Hope this addresses your concerns,

Dario

[1] 
https://meta.wikimedia.org/w/index.php?title=File:2014_Readership_Update,_WMF_Metrics_Meeting,_December.pdf&page=10
 
<https://meta.wikimedia.org/w/index.php?title=File:2014_Readership_Update,_WMF_Metrics_Meeting,_December.pdf&page=10>
[2] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive 
<https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive>

> On Jan 12, 2015, at 1:27 PM, Andrew Gray <[email protected]> wrote:
> 
> Hi all,
> 
> I'm curious about the privacy implications as well. I can't think of
> specific problems with this data, *but* it's information that I didn't
> think we'd ever been logging. We've historically been quite hands-off
> with any kind of reader information, other than raw hit counts, and
> there might well be some community discomfort at discovering it's been
> both tracked and released, even if completely anonymised.
> 
> Andrew.
> 
> On 12 January 2015 at 20:08, Toby Negrin <[email protected]> wrote:
>> Thanks Amir -- feel free to have your friend reach out to this list
>> directly.
>> 
>> As Ellery said, we're figuring our if there are any privacy implications in
>> releasing this dataset.
>> 
>> -Toby
>> 
>> On Mon, Jan 12, 2015 at 12:05 PM, Amir E. Aharoni
>> <[email protected]> wrote:
>>> 
>>> I am asking for a real-life friend who is doing some research. It's not
>>> for any particular project of mine, but I can easily imagine that it can be
>>> useful for a lot of editors and product managers as I wrote in the opening
>>> post.
>>> 
>>> (And I cannot think of any privacy problems if the data is not tied to any
>>> particular people, but maybe I'm naive.)
>>> 
>>> 
>>> --
>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>> http://aharoni.wordpress.com
>>> ‪“We're living in pieces,
>>> I want to live in peace.” – T. Moore‬
>>> 
>>> 2015-01-12 22:00 GMT+02:00 Toby Negrin <[email protected]>:
>>>> 
>>>> Hi Amir --
>>>> 
>>>> Would you like to see these datasets released publicly or was there a
>>>> specific project you were interested in using them for?
>>>> 
>>>> thanks,
>>>> 
>>>> -Toby
>>>> 
>>>> On Mon, Jan 12, 2015 at 5:44 AM, Amir E. Aharoni
>>>> <[email protected]> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Are there metrics about which links in each article are the most
>>>>> clicked?
>>>>> 
>>>>> I can think there's a lot to be learned from it:
>>>>> * Data-driven suggestions for manual of style about linking (too much
>>>>> and too few links are a perennial topic of argument)
>>>>> * How do people traverse between topics.
>>>>> * Which terms in the article may need a short explanation in parentheses
>>>>> rather than just a link.
>>>>> * How far down into the article do people bother to read.
>>>>> 
>>>>> Anyway, I can think that accessibility to such data can optimize both
>>>>> readership and editing.
>>>>> 
>>>>> And maybe this can be just taken right from the logs, without any
>>>>> additional EventLogging.
>>>>> 
>>>>> --
>>>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>>>> http://aharoni.wordpress.com
>>>>> ‪“We're living in pieces,
>>>>> I want to live in peace.” – T. Moore‬
>>>>> 
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> 
>> 
>> 
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> 
> 
> 
> 
> -- 
> - Andrew Gray
>  [email protected]
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to