Pine, have you considered asking Milowent who they work with on the IP
data? I really, really doubt that there is some sort of shady back-alley
data dealing going down here. - Jonathan

On Thu, Oct 16, 2014 at 9:52 PM, Pine W <[email protected]> wrote:

> Thanks Toby.
>
> I understand that IPs are not an especially accurate way to look at unique
> visitors, but for the purposes of the Signpost's traffic report and the Top
> 25 I feel that they are reasonable approximations of ways to filter out
> what appear to be automated requests.
>
> I am ok with holding those logs for 30 days, although I am a little
> surprised to hear that this is happening. However, what worries me a bit
> more is the idea that a staff member can be accessing those logs without
> that access being recorded. This might be something that you wish to
> investigate further.
>
> I am not interested in getting this staff person into trouble. The
> information that they are providing is useful to the Signpost and certainly
> seems to be sanitized to a reasonable degree. However, it does concern me
> that they can access these logs without someone knowing about it, it seems
> to me that this sort of activity should be proactively disclosed to people
> in WMF who conduct legal and security reviews, and I hope you will consider
> what sort of security features are appropriate to make sure that occasions
> when anyone accesses the raw logs are recorded in a robust manner. I worry
> that if this one staffer can access logs without the higher-ups knowing
> about it, it is possible that someone who intends to do unethical
> activities with WMF's data could also access the logs without being noticed.
>
> Thanks,
>
> Pine
>
>
> On Thu, Oct 16, 2014 at 9:31 PM, Toby Negrin <[email protected]>
> wrote:
>
>> Hi Pine --
>>
>> Thanks for this -- it's a challenging topic but one that the Analytics
>> team takes very seriously.
>>
>> I'm not familiar with the IP address review that's referenced in the
>> link. I don't know who the staffer might be. We don't currently calculate
>> unique visitors to anything in Analytics and IP address is not a
>> particularly accurate way to assess unique visitors regardless (due to
>> proxies/NATs/etc).
>>
>> We do store IPs as part of page requests in our raw logs which are
>> deleted every 30 days. This data is kept on a system where access is
>> limited and controlled by the operations team. We're in line with the
>> privacy policy on this.
>>
>> To be clear, we are currently considering mechanisms to count unique
>> "requests" -- we rely on Comscore for this data and for several reasons,
>> primarily related to mobile usage, it's not sufficient to understand our
>> usage patterns. We are putting together some proposals to do this in as
>> limited way as possible and that's respectful to our users. We'll share
>> this with the community when we feel we understand the use cases and
>> trade-offs well enough to discuss in an informed manner.
>>
>> -Toby
>>
>>
>>
>> We do store the IP address associated with varnish requests as part of
>> the log. This data is
>>
>>
>>
>> On Thu, Oct 16, 2014 at 8:50 PM, Pine W <[email protected]> wrote:
>>
>>> Hi again Analytics,
>>>
>>> I was under the impression that no records are kept of which IPs access
>>> which articles on Wikipedia when no edits are made, but it appears that
>>> such records are in fact kept [1].
>>>
>>> Is this proper? This practice appears to be permissible under the
>>> Privacy Policy which states that "We use IP addresses for research and
>>> analytics; to better personalize content, notices, and settings for you; to
>>> fight spam, identity theft, malware, and other kinds of abuse; and to
>>> provide better mobile and other applications."
>>>
>>> It is possible that this information is relevant for determining the
>>> number of unique visitors that Wikipedia gets and that this information is
>>> always properly filtered before it gets to the Signpost. However, given
>>> recent discussions which I thought said that Wikipedia was not instrumented
>>> to track unique visitors, I am surprised to learn that this already seems
>>> to be happening and that the situation has been this way for some time, so
>>> I would appreciate clarification.
>>>
>>> I want to emphasize that this question is about clarifying the practice
>>> of tracking likely unique visitors by IP. This question is not intended to
>>> start flame wars, get people into trouble, or limit the Signpost's access
>>> to properly filtered information if there has been a determination that
>>> WMF's retention of the raw data is appropriate. There might be appropriate
>>> secondary questions about making sure that access to the raw IP access data
>>> is carefully contained and secured.
>>>
>>> Thank you very much,
>>>
>>> Pine
>>>
>>> [1]
>>> https://en.wikipedia.org/w/index.php?title=User_talk%3ASerendipodous&diff=629934257&oldid=629932288
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
[email protected]
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to