I'll review Daniel's email and will get back to him/you on this list in the next day or so.
Leila -- Leila Zia Senior Research Scientist Wikimedia Foundation On Mon, Jul 24, 2017 at 7:59 AM, Nuria Ruiz <[email protected]> wrote: > Daniel, > > Singining an NDA is not enough to get access to the data, you also need to > be part of a formal research collaboration with our research team, they > have a number of those and they are not likely to accept any more soon but > you can contact them on that regard: > https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations > > Thanks, > > Nuria > > > > On Mon, Jul 24, 2017 at 6:37 AM, Daniel Oberski <[email protected]> > wrote: >> >> Dear list, >> >> I'm posting a recent conversation with Dan below, as well as a few >> follow-up questions. >> >> Dan was kind enough to point out this list. I apologize that the post is >> "backward" (in >> email-thread format) due to my ignorance, will use this list from now on. >> >> Thanks, Daniel >> >> >> ---- >> >> Hi Dan >> >> >> Thanks for getting back to me so quickly! >> >> >Thanks for writing. In general these questions are best asked on our >> > public list, so other >> >people can see and benefit from any answers: >> > https://lists.wikimedia.org/mailman/listinfo/ >> >analytics >> >> Thanks, I've joined this list and will ask subsequent questions there. >> >> >* pairs of pages: we have two datasets that are mentioned in this task >> > https:// >> >phabricator.wikimedia.org/T158972 which should be very interesting for >> > this purpose. They >> >aren't being updated right now, and the task is to do just that. We'll >> > probably get to >> >that within the next 3 months, but a bunch of us are on paternity leave >> > this summer, so >> >things are a little slower than normal >> >> This seems close to what I need. From the descriptions I gather the >> linkage is by session. >> Is there also a linkage by ip (with IP's removed of course)? >> >> >* country data for pageviews: for privacy reasons we only allow access to >> > this with an >> >NDA. We have good data on it, but you need to sign this NDA and use our >> > cluster to access >> >it, being careful about what you report about it to the world at large. >> > Here's information >> >on that: https://wikitech.wikimedia.org/wiki/Volunteer_NDA >> >> I've read this and am happy to sign an NDA. I understand it is best to be >> as specific as >> possible about the reasoning, intentions with the data, and permissions >> required. For me to >> figure this out it would be useful to know the relevant parts of the >> database schema, and >> perhaps a hint as to which data might be most interesting there. Would you >> be able to point >> me towards that? >> >> >Hope that helps, and feel free to write back to the public list in the >> > future. >> >> Definitely, very helpful and thank you! >> >> Best, Daniel >> >> >> On Wed, Jul 19, 2017 at 9:51 AM, Oberski, D.L. (Daniel) >> <[email protected]> wrote: >> Dear Dan, >> >> >> My name is Daniel Oberski, I'm an associate professor of data science >> methodology in the >> department of statistics at Utrecht University in the Netherlands. >> >> I've been using your incredibly useful pageviews API to study correlations >> between the >> amount of interest people show in a topic (pageviews) with other data such >> as political >> party preference over time. That has yielded some interesting results >> (which I have yet to >> write up). >> >> However, to do a better study it would be very helpful to have slightly >> more information >> than is in the API. Specifically, it would be very useful to be able to >> query, for each >> _pair_ of pages, how many people (or IP's) viewed _both_ of those pages. >> That way I can find >> out which pages are really indicative of interest in a specific common >> topic, rather than >> just correlated by accident. In addition, I've found it hard to figure out >> pageviews for >> specific pages by country rather than language. >> >> My question is, would you happen to know if is there any way to obtain >> this information? >> (does not necessarily have to be through the API.) Or do you know if there >> are people to >> whom I might talk about this? >> >> Thanks for reading (to) the end and best regards, >> >> Daniel >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
