Hi Lars, You have a couple of options:
1. download the data in lossless compressed form, https://dumps.wikimedia. org/other/pagecounts-ez/ The format is clever and doesn't lose granularity, should be a lot quicker than pagecounts-raw (this is basically what stats.grok.se did with the data as well, so downloading this way should be equivalent) 2. work on Toolforge, a virtual cloud that's on the same network as the data, so getting the data is a lot faster and you can use our compute resources (free, of course): https://wikitech.wikimedia.org/wiki/Portal: Toolforge If you decide to go with the second option, the IRC channel where they support folks like you is #wikimedia-cloud and you can always find me there as milimetric. On Tue, Feb 20, 2018 at 12:51 PM, Lars Hillebrand <[email protected] > wrote: > Dear Analytics Team, > > I am a M.Sc. student at Copenhagen Business School. For my Master Thesis I > would like to use page views data from certain Wikipedia articles. I found > out that in July 2015 a new API was created which delivers this data. > However, for my project I have to use data from before 2015. > In my further search I found out that the old page views data exists ( > https://dumps.wikimedia.org/other/pagecounts-raw/) and until March 2017 > it could be queried by using stats.grok.se. Unfortunately, this site does > no longer exists, which is why I cannot filter and query the raw data in > .gz format on the webpage. > > Are there any possibilities to get the page views data for certain > articles from before July 2017? > > Thanks a lot and best regards, > > Lars Hillebrand > > PS: I am conducting my research in R and for the post 2015 data the > package “pageviews” works great. > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
