Dear Wikimedia analytics team,

We are 3 master students from Vrije Universiteit Amsterdam (VU) and Universtity 
of Amsterdam (UVA) doing a large scale data engineering project about detecting 
DDOS attacks on Wikipedia by analysing page views and traffic and trying to 
distinguish e.g. DDOS attacks from trending topics.

For this project, we need a lot of data. We found two sources of public data, 
Pageview complete (https://dumps.wikimedia.org/other/pageview_complete/) and 
the filtered version thereof (https://dumps.wikimedia.org/other/pageviews/). 
While these dumps are already quite useful, we also found that there is a 
dataset with even more information 
(https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly),
 in particular it contains the country a pageview came from and the referer, 
which could both be very useful for our project.

According to the above page, this dataset has been made private since 2018. We 
would like to ask whether it is possible to have access to this dataset for our 
research, or any other extended version of the public dump, which would enable 
us to do more in-depth research. We have our own cluster so we could work on a 
copy of the data. Moreover we would like to share our project and all our 
results with you to help contribute to your security measures.

Best regards,
Charel Felten, Gilles Magalhaes and Aleksander Janczewski
_______________________________________________
Analytics mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to