Like dumps on article-day level? That would be already super awesome much better than the current state.
Best, Peter Am 22.02.2018 22:23 schrieb "Dan Andreescu" <dandree...@wikimedia.org>: > Peter, the data you mention here is quite large, and storage is cheap but > not free. For now, we don't have capacity to serve that kind of timespan > from the API, but we will work to improve the dumps version so it's more > comprehensive. > > On Thu, Feb 22, 2018 at 4:12 PM, Peter Meissner <retep.meiss...@gmail.com> > wrote: > >> Dear List-eners, >> >> >> I write in to argue the case for an Wikipedia effort to make something >> like stats.grok.se (page views per day per article from 2007 onwards) >> available again. >> >> >> I am author of the first R-package that was providing easy access to >> pageview counts by accessing the stats.grok.se service and translating >> the it into need little R data frames. >> >> Since stats.grok.se is gone somebody writes in once a month - mostly >> from academia - asking about the status of page view data for the time >> before late 2015 - counts, per article, per day. To underline this further: >> the R pageviews package written by one of your former colleagues has over >> 7000 downloads within 2 years while my package has 14000 within 4 years >> (which are conservative numbers because they stem from one particular CRAN >> mirror only). >> >> I made some efforts to reconstruct the service that stats.grok.se was >> providing but well it's not a trivial endeavour as far as I can see (BIG >> data, demanding some computing time and storage resources and bandwidth, >> and some thinking about how to re-arrange and aggregate the data so it can >> be queried and served efficiently - not to mention that the data is raw >> meaning it needs some proper cleaning up before using, also hosting will >> need some resources, ...) - and so my efforts have gone nowhere . >> >> >> Would it not be nice if Wikipedia could jump in and support research by >> going the whole mile and making those page counts available? >> >> In regard to the prioritizing - I am sure you have a long backlog - I >> would argue that this is something that really is a multiplier thing. It >> enables a lot of people to start researching. Daily page counts are not >> that fancy but without them people are simply blocked. They cannot start >> because they cant even get a basic idea about what was the general article >> popularity for a given day. >> >> >> Best Peter >> >> >> >> PS.: I would be willing to put in some time to help you folks in any way >> I can. >> >> >> 2018-02-22 21:56 GMT+01:00 Dan Andreescu <dandree...@wikimedia.org>: >> >>> My view had been informed by the documentation at >>>> https://dumps.wikimedia.org/other/pagecounts-ez/: >>>> >>>> Hourly page views per article for around 30 million article titles >>>>> (Sept 2013) in around 800+ Wikimedia wikis. Repackaged (with extreme >>>>> shrinkage, without losing granularity), corrected, reformatted. Daily >>>>> files >>>>> and two monthly files (see notes below). >>>> >>>> >>>> Regarding the claim that pagecounts-ez has data back to when wikimedia >>>> started tracking pageviews, I'll point out another error in the >>>> documentation that may have led to that view. The documentation claims that >>>> data is available from 2007 onward: >>>> >>>> From 2007 to May 2015: derived from Domas' pagecount/projectcount files >>>> >>>> >>>> However, if you check out the actual files ( >>>> https://dumps.wikimedia.org/other/pagecounts-ez/merged/), you'll see >>>> that the pagecounts only go back to late 2011. >>>> >>> >>> Ah, yes, but the projectcount files go back to 2007-12, that's where >>> that confusion comes from, we should clarify or generate the old data. I'm >>> not sure whether this is easy, but I think it's fairly straightforward and >>> I've opened a task for it: https://phabricator.wikimedia.org/T188041 >>> (we have a lot of work in our backlog, though, so we probably won't be able >>> to get to this for a bit). >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics