Hi Daniel, I think your request is probably one that can be worked out in
such a way that private information is sufficiently protected. The request
from Michal, at least as I understand its current form, is of a much
different scope. Thanks for following up.

Pine

On Wed, Mar 23, 2016 at 7:29 AM, Nuria Ruiz <[email protected]> wrote:

> >In my understanding these fields do not include any "personal
> information" as per the WMF privacy policy. Please correct me if I'm wrong
> here.
> This is correct for data requested here:
> https://phabricator.wikimedia.org/T128132
>
> On Wed, Mar 23, 2016 at 1:23 AM, Daniel Berger <[email protected]>
> wrote:
>
>> Hi everyone,
>>
>> as the one, who requested data for performance research/testing, I'm
>> happy to participate in the discussion.
>>
>> The second request, by Michal, might not be about performance. I believe
>> Michal hasn't provided any details, as yet. I thought I could help Michal
>> by pointing out similarities to my request, but I now see that the two
>> requests might be quite different.
>>
>> It is my goal to compile a dataset, which does not include any private
>> data. My request essentially asks for a higher-resolution version of the
>> publicly available pagecounts data. And an update to a dataset, which has
>> been made public in 2007 [1].
>>
>> Specifically, the data set would hold the same fields as the pagecounts
>> data, at a higher sampling rate: 1:10 instead of hourly.
>> In addition to the pagecounts fields, the public 2007 dataset has one
>> additional field "save_flag", which indicates whether the request changed a
>> web page. In order to compile this save_flag, three other webrequest fields
>> need to be accessed, as pointed out in Tim Starling's email [2]. Tim was
>> the one, who helped compiling the 2007 dataset.
>>
>> In my understanding these fields do not include any "personal
>> information" as per the WMF privacy policy. Please correct me if I'm wrong
>> here.
>>
>>
>> I also would like to point out that I'm asking to make this dataset
>> public (as opposed to giving it to only my research group). If helpful, I'd
>> be willing to host this dataset on my institutions web server, or in a
>> public AWS S3 bucket to facilitate access by the community.
>>
>> I made a few updates to clarify these points in the phabricator item,
>> were you can find further information:
>> https://phabricator.wikimedia.org/T128132
>> The comments on that page discuss how we can restrict the scope to only
>> the English Wikipedia and to individual WMF caching servers to scale down
>> the dataset size.
>>
>>
>> Let me know what you think.
>>
>> Best,
>> Daniel
>>
>> [1] http://www.wikibench.eu/?page_id=60
>> [2] http://thread.gmane.org/gmane.org.wikimedia.analytics/3405/focus=3408
>>
>>
>>
>> On 03/22/2016 08:55 PM, Pine W wrote:
>>
>> Hi Dan,
>>
>> Agreed, I think it makes sense to consider a subject-specific request for
>> pages that are within the scope of epidemiology, such as influenza, where
>> we have reason to think that there could be public health benefits in
>> analyzing the data and there are reasonable safeguards to protect user
>> anonymity.
>>
>> A request for 1 month of the private data requested here, which appears
>> to be for all pages on all projects, is far too broadly scoped. Also, in
>> general, I my instinct would be to deny external requests for WMF private
>> data for purposes of performance testing. It seems to me that the risks far
>> outweigh the benefits to Wikimedia, and that processing requests like these
>> would be a suboptimal use of WMF staff time.
>>
>> Pine
>>
>> On Tue, Mar 22, 2016 at 12:44 PM, Dan Andreescu <[email protected]
>> > wrote:
>>
>>> Pine, there are actually two separate requests and they shouldn't be
>>> mixed.  The performance-related one is research as far as I understand, and
>>> the other one we have no details yet.  I welcome a public discussion of
>>> either, and of course would respect any opinions held by the analytics
>>> community at large.  We have every intention to be good stewards of this
>>> data and for what it's worth, I'm very skeptical of allowing access to
>>> private data, unless for obviously beneficial purposes like flu
>>> forecasting, etc.
>>>
>>> On Tue, Mar 22, 2016 at 1:37 PM, Pine W < <[email protected]>
>>> [email protected]> wrote:
>>>
>>>> I'd appreciate a clarification about the purpose of this request if
>>>> Wikimedia private data is involved. If I am understanding correctly, the
>>>> purpose of this request is for access to Wikimedia private data for
>>>> assistsnce with 3rd party performance testing. If that is the case, I
>>>> believe that the access request for private should simply be denied.
>>>>
>>>> Pine
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> _______________________________________________
>> Analytics mailing 
>> [email protected]https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to