Thanks Dan! We should try to have this kind of information in the actual documentation updated; I just added your remarks to the page about pagecounts-raw <https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw>, where the pagecounts-ez alternative had not been mentioned yet.
On Wed, Feb 21, 2018 at 11:26 AM, Dan Andreescu <[email protected]> wrote: > Hi Lars, > > You have a couple of options: > > 1. download the data in lossless compressed form, https://dumps.wikimedia. > org/other/pagecounts-ez/ The format is clever and doesn't lose > granularity, should be a lot quicker than pagecounts-raw (this is basically > what stats.grok.se did with the data as well, so downloading this way > should be equivalent) > 2. work on Toolforge, a virtual cloud that's on the same network as the > data, so getting the data is a lot faster and you can use our compute > resources (free, of course): https://wikitech.wiki > media.org/wiki/Portal:Toolforge > More specifically there is https://wikitech.wikimedia.org/wiki/PAWS . (And I assume "getting the data" meant transferring the files from dumps.wikimedia.org, correct?) > > If you decide to go with the second option, the IRC channel where they > support folks like you is #wikimedia-cloud and you can always find me there > as milimetric. > > > On Tue, Feb 20, 2018 at 12:51 PM, Lars Hillebrand < > [email protected]> wrote: > >> Dear Analytics Team, >> >> I am a M.Sc. student at Copenhagen Business School. For my Master Thesis >> I would like to use page views data from certain Wikipedia articles. I >> found out that in July 2015 a new API was created which delivers this data. >> However, for my project I have to use data from before 2015. >> In my further search I found out that the old page views data exists ( >> https://dumps.wikimedia.org/other/pagecounts-raw/) and until March 2017 >> it could be queried by using stats.grok.se. Unfortunately, this site >> does no longer exists, which is why I cannot filter and query the raw data >> in .gz format on the webpage. >> >> Are there any possibilities to get the page views data for certain >> articles from before July 2017? >> >> Thanks a lot and best regards, >> >> Lars Hillebrand >> >> PS: I am conducting my research in R and for the post 2015 data the >> package “pageviews” works great. >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
