One clarification point I'd make is that while the data is lossless for 30M
articles, it is 100% lossy for redirects, old page names, or pages created
after September 2013, correct?
On Wed, Feb 21, 2018 at 2:26 PM, Dan Andreescu <dandree...@wikimedia.org>
> Hi Lars,
> You have a couple of options:
> 1. download the data in lossless compressed form, https://dumps.wikimedia.
> org/other/pagecounts-ez/ The format is clever and doesn't lose
> granularity, should be a lot quicker than pagecounts-raw (this is basically
> what stats.grok.se did with the data as well, so downloading this way
> should be equivalent)
> 2. work on Toolforge, a virtual cloud that's on the same network as the
> data, so getting the data is a lot faster and you can use our compute
> resources (free, of course): https://wikitech.wiki
> If you decide to go with the second option, the IRC channel where they
> support folks like you is #wikimedia-cloud and you can always find me there
> as milimetric.
> On Tue, Feb 20, 2018 at 12:51 PM, Lars Hillebrand <
> larshillebr...@icloud.com> wrote:
>> Dear Analytics Team,
>> I am a M.Sc. student at Copenhagen Business School. For my Master Thesis
>> I would like to use page views data from certain Wikipedia articles. I
>> found out that in July 2015 a new API was created which delivers this data.
>> However, for my project I have to use data from before 2015.
>> In my further search I found out that the old page views data exists (
>> https://dumps.wikimedia.org/other/pagecounts-raw/) and until March 2017
>> it could be queried by using stats.grok.se. Unfortunately, this site
>> does no longer exists, which is why I cannot filter and query the raw data
>> in .gz format on the webpage.
>> Are there any possibilities to get the page views data for certain
>> articles from before July 2017?
>> Thanks a lot and best regards,
>> Lars Hillebrand
>> PS: I am conducting my research in R and for the post 2015 data the
>> package “pageviews” works great.
>> Analytics mailing list
> Analytics mailing list
Lead Data Engineer
379 West Broadway
New York, NY 10012
Analytics mailing list