Pine, have you considered asking Milowent who they work with on the IP data? I really, really doubt that there is some sort of shady back-alley data dealing going down here. - Jonathan
On Thu, Oct 16, 2014 at 9:52 PM, Pine W <[email protected]> wrote: > Thanks Toby. > > I understand that IPs are not an especially accurate way to look at unique > visitors, but for the purposes of the Signpost's traffic report and the Top > 25 I feel that they are reasonable approximations of ways to filter out > what appear to be automated requests. > > I am ok with holding those logs for 30 days, although I am a little > surprised to hear that this is happening. However, what worries me a bit > more is the idea that a staff member can be accessing those logs without > that access being recorded. This might be something that you wish to > investigate further. > > I am not interested in getting this staff person into trouble. The > information that they are providing is useful to the Signpost and certainly > seems to be sanitized to a reasonable degree. However, it does concern me > that they can access these logs without someone knowing about it, it seems > to me that this sort of activity should be proactively disclosed to people > in WMF who conduct legal and security reviews, and I hope you will consider > what sort of security features are appropriate to make sure that occasions > when anyone accesses the raw logs are recorded in a robust manner. I worry > that if this one staffer can access logs without the higher-ups knowing > about it, it is possible that someone who intends to do unethical > activities with WMF's data could also access the logs without being noticed. > > Thanks, > > Pine > > > On Thu, Oct 16, 2014 at 9:31 PM, Toby Negrin <[email protected]> > wrote: > >> Hi Pine -- >> >> Thanks for this -- it's a challenging topic but one that the Analytics >> team takes very seriously. >> >> I'm not familiar with the IP address review that's referenced in the >> link. I don't know who the staffer might be. We don't currently calculate >> unique visitors to anything in Analytics and IP address is not a >> particularly accurate way to assess unique visitors regardless (due to >> proxies/NATs/etc). >> >> We do store IPs as part of page requests in our raw logs which are >> deleted every 30 days. This data is kept on a system where access is >> limited and controlled by the operations team. We're in line with the >> privacy policy on this. >> >> To be clear, we are currently considering mechanisms to count unique >> "requests" -- we rely on Comscore for this data and for several reasons, >> primarily related to mobile usage, it's not sufficient to understand our >> usage patterns. We are putting together some proposals to do this in as >> limited way as possible and that's respectful to our users. We'll share >> this with the community when we feel we understand the use cases and >> trade-offs well enough to discuss in an informed manner. >> >> -Toby >> >> >> >> We do store the IP address associated with varnish requests as part of >> the log. This data is >> >> >> >> On Thu, Oct 16, 2014 at 8:50 PM, Pine W <[email protected]> wrote: >> >>> Hi again Analytics, >>> >>> I was under the impression that no records are kept of which IPs access >>> which articles on Wikipedia when no edits are made, but it appears that >>> such records are in fact kept [1]. >>> >>> Is this proper? This practice appears to be permissible under the >>> Privacy Policy which states that "We use IP addresses for research and >>> analytics; to better personalize content, notices, and settings for you; to >>> fight spam, identity theft, malware, and other kinds of abuse; and to >>> provide better mobile and other applications." >>> >>> It is possible that this information is relevant for determining the >>> number of unique visitors that Wikipedia gets and that this information is >>> always properly filtered before it gets to the Signpost. However, given >>> recent discussions which I thought said that Wikipedia was not instrumented >>> to track unique visitors, I am surprised to learn that this already seems >>> to be happening and that the situation has been this way for some time, so >>> I would appreciate clarification. >>> >>> I want to emphasize that this question is about clarifying the practice >>> of tracking likely unique visitors by IP. This question is not intended to >>> start flame wars, get people into trouble, or limit the Signpost's access >>> to properly filtered information if there has been a determination that >>> WMF's retention of the raw data is appropriate. There might be appropriate >>> secondary questions about making sure that access to the raw IP access data >>> is carefully contained and secured. >>> >>> Thank you very much, >>> >>> Pine >>> >>> [1] >>> https://en.wikipedia.org/w/index.php?title=User_talk%3ASerendipodous&diff=629934257&oldid=629932288 >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Jonathan T. Morgan Learning Strategist Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> [email protected]
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
