Thanks Dan! We should try to have this kind of information in the actual
documentation updated; I just added your remarks to the page about
pagecounts-raw
<https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw>,
where the pagecounts-ez alternative had not been mentioned yet.

On Wed, Feb 21, 2018 at 11:26 AM, Dan Andreescu <[email protected]>
wrote:

> Hi Lars,
>
> You have a couple of options:
>
> 1. download the data in lossless compressed form, https://dumps.wikimedia.
> org/other/pagecounts-ez/  The format is clever and doesn't lose
> granularity, should be a lot quicker than pagecounts-raw (this is basically
> what stats.grok.se did with the data as well, so downloading this way
> should be equivalent)
> 2. work on Toolforge, a virtual cloud that's on the same network as the
> data, so getting the data is a lot faster and you can use our compute
> resources (free, of course): https://wikitech.wiki
> media.org/wiki/Portal:Toolforge
>
 More specifically there is https://wikitech.wikimedia.org/wiki/PAWS . (And
I assume "getting the data" meant transferring the files from
dumps.wikimedia.org, correct?)

>
> If you decide to go with the second option, the IRC channel where they
> support folks like you is #wikimedia-cloud and you can always find me there
> as milimetric.
>
>
> On Tue, Feb 20, 2018 at 12:51 PM, Lars Hillebrand <
> [email protected]> wrote:
>
>> Dear Analytics Team,
>>
>> I am a M.Sc. student at Copenhagen Business School. For my Master Thesis
>> I would like to use page views data from certain Wikipedia articles. I
>> found out that in July 2015 a new API was created which delivers this data.
>> However, for my project I have to use data from before 2015.
>> In my further search I found out that the old page views data exists (
>> https://dumps.wikimedia.org/other/pagecounts-raw/) and until March 2017
>> it could be queried by using stats.grok.se. Unfortunately, this site
>> does no longer exists, which is why I cannot filter and query the raw data
>> in .gz format on the webpage.
>>
>> Are there any possibilities to get the page views data for certain
>> articles from before July 2017?
>>
>> Thanks a lot and best regards,
>>
>> Lars Hillebrand
>>
>> PS: I am conducting my research in R and for the post 2015 data the
>> package “pageviews” works great.
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to