Daniel, Singining an NDA is not enough to get access to the data, you also need to be part of a formal research collaboration with our research team, they have a number of those and they are not likely to accept any more soon but you can contact them on that regard: https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations
Thanks, Nuria On Mon, Jul 24, 2017 at 6:37 AM, Daniel Oberski <[email protected]> wrote: > Dear list, > > I'm posting a recent conversation with Dan below, as well as a few > follow-up questions. > > Dan was kind enough to point out this list. I apologize that the post is > "backward" (in > email-thread format) due to my ignorance, will use this list from now on. > > Thanks, Daniel > > > ---- > > Hi Dan > > > Thanks for getting back to me so quickly! > > >Thanks for writing. In general these questions are best asked on our > public list, so other > >people can see and benefit from any answers: https://lists.wikimedia.org/ > mailman/listinfo/ > >analytics > > Thanks, I've joined this list and will ask subsequent questions there. > > >* pairs of pages: we have two datasets that are mentioned in this task > https:// > >phabricator.wikimedia.org/T158972 which should be very interesting for > this purpose. They > >aren't being updated right now, and the task is to do just that. We'll > probably get to > >that within the next 3 months, but a bunch of us are on paternity leave > this summer, so > >things are a little slower than normal > > This seems close to what I need. From the descriptions I gather the > linkage is by session. > Is there also a linkage by ip (with IP's removed of course)? > > >* country data for pageviews: for privacy reasons we only allow access to > this with an > >NDA. We have good data on it, but you need to sign this NDA and use our > cluster to access > >it, being careful about what you report about it to the world at large. > Here's information > >on that: https://wikitech.wikimedia.org/wiki/Volunteer_NDA > > I've read this and am happy to sign an NDA. I understand it is best to be > as specific as > possible about the reasoning, intentions with the data, and permissions > required. For me to > figure this out it would be useful to know the relevant parts of the > database schema, and > perhaps a hint as to which data might be most interesting there. Would you > be able to point > me towards that? > > >Hope that helps, and feel free to write back to the public list in the > future. > > Definitely, very helpful and thank you! > > Best, Daniel > > > On Wed, Jul 19, 2017 at 9:51 AM, Oberski, D.L. (Daniel) <[email protected]> > wrote: > Dear Dan, > > > My name is Daniel Oberski, I'm an associate professor of data science > methodology in the > department of statistics at Utrecht University in the Netherlands. > > I've been using your incredibly useful pageviews API to study correlations > between the > amount of interest people show in a topic (pageviews) with other data such > as political > party preference over time. That has yielded some interesting results > (which I have yet to > write up). > > However, to do a better study it would be very helpful to have slightly > more information > than is in the API. Specifically, it would be very useful to be able to > query, for each > _pair_ of pages, how many people (or IP's) viewed _both_ of those pages. > That way I can find > out which pages are really indicative of interest in a specific common > topic, rather than > just correlated by accident. In addition, I've found it hard to figure out > pageviews for > specific pages by country rather than language. > > My question is, would you happen to know if is there any way to obtain > this information? > (does not necessarily have to be through the API.) Or do you know if there > are people to > whom I might talk about this? > > Thanks for reading (to) the end and best regards, > > Daniel > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
