Dear list, 

I'm posting a recent conversation with Dan below, as well as a few follow-up 
questions. 

Dan was kind enough to point out this list. I apologize that the post is 
"backward" (in 
email-thread format) due to my ignorance, will use this list from now on. 

Thanks, Daniel


---- 

Hi Dan


Thanks for getting back to me so quickly! 

>Thanks for writing.  In general these questions are best asked on our public 
>list, so other 
>people can see and benefit from any answers: 
>https://lists.wikimedia.org/mailman/listinfo/
>analytics

Thanks, I've joined this list and will ask subsequent questions there. 

>* pairs of pages: we have two datasets that are mentioned in this task https://
>phabricator.wikimedia.org/T158972 which should be very interesting for this 
>purpose.  They 
>aren't being updated right now, and the task is to do just that.  We'll 
>probably get to 
>that within the next 3 months, but a bunch of us are on paternity leave this 
>summer, so 
>things are a little slower than normal

This seems close to what I need. From the descriptions I gather the linkage is 
by session. 
Is there also a linkage by ip (with IP's removed of course)?

>* country data for pageviews: for privacy reasons we only allow access to this 
>with an 
>NDA.  We have good data on it, but you need to sign this NDA and use our 
>cluster to access 
>it, being careful about what you report about it to the world at large.  
>Here's information 
>on that: https://wikitech.wikimedia.org/wiki/Volunteer_NDA

I've read this and am happy to sign an NDA. I understand it is best to be as 
specific as 
possible about the reasoning, intentions with the data, and permissions 
required. For me to 
figure this out it would be useful to know the relevant parts of the database 
schema, and 
perhaps a hint as to which data might be most interesting there. Would you be 
able to point 
me towards that?

>Hope that helps, and feel free to write back to the public list in the future.

Definitely, very helpful and thank you!

Best, Daniel


On Wed, Jul 19, 2017 at 9:51 AM, Oberski, D.L. (Daniel) <[email protected]> 
wrote:
Dear Dan,


My name is Daniel Oberski, I'm an associate professor of data science 
methodology in the 
department of statistics at Utrecht University in the Netherlands.

I've been using your incredibly useful pageviews API to study correlations 
between the 
amount of interest people show in a topic (pageviews) with other data such as 
political 
party preference over time. That has yielded some interesting results (which I 
have yet to 
write up).

However, to do a better study it would be very helpful to have slightly more 
information 
than is in the API. Specifically, it would be very useful to be able to query, 
for each 
_pair_ of pages, how many people (or IP's) viewed _both_ of those pages. That 
way I can find 
out which pages are really indicative of interest in a specific common topic, 
rather than 
just correlated by accident. In addition, I've found it hard to figure out 
pageviews for 
specific pages by country rather than language. 

My question is, would you happen to know if is there any way to obtain this 
information? 
(does not necessarily have to be through the API.) Or do you know if there are 
people to 
whom I might talk about this?

Thanks for reading (to) the end and best regards,

Daniel



_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to