Peter: Do submit a phabricator tasks with your request, it'll be easier to follow on it than it is via e-mail. Our backlog: https://phabricator.wikimedia.org/tag/analytics/
I assume you know that per article views are available since 2015, a way to see those: https://tools.wmflabs.org/pageviews/ Per project views are available since early on, in either downloadable files or programatic form: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts Thanks, Nuria On Thu, Feb 22, 2018 at 1:44 PM, Peter Meissner <[email protected]> wrote: > Like dumps on article-day level? That would be already super awesome much > better than the current state. > > Best, Peter > > Am 22.02.2018 22:23 schrieb "Dan Andreescu" <[email protected]>: > >> Peter, the data you mention here is quite large, and storage is cheap but >> not free. For now, we don't have capacity to serve that kind of timespan >> from the API, but we will work to improve the dumps version so it's more >> comprehensive. >> >> On Thu, Feb 22, 2018 at 4:12 PM, Peter Meissner <[email protected] >> > wrote: >> >>> Dear List-eners, >>> >>> >>> I write in to argue the case for an Wikipedia effort to make something >>> like stats.grok.se (page views per day per article from 2007 onwards) >>> available again. >>> >>> >>> I am author of the first R-package that was providing easy access to >>> pageview counts by accessing the stats.grok.se service and translating >>> the it into need little R data frames. >>> >>> Since stats.grok.se is gone somebody writes in once a month - mostly >>> from academia - asking about the status of page view data for the time >>> before late 2015 - counts, per article, per day. To underline this further: >>> the R pageviews package written by one of your former colleagues has over >>> 7000 downloads within 2 years while my package has 14000 within 4 years >>> (which are conservative numbers because they stem from one particular CRAN >>> mirror only). >>> >>> I made some efforts to reconstruct the service that stats.grok.se was >>> providing but well it's not a trivial endeavour as far as I can see (BIG >>> data, demanding some computing time and storage resources and bandwidth, >>> and some thinking about how to re-arrange and aggregate the data so it can >>> be queried and served efficiently - not to mention that the data is raw >>> meaning it needs some proper cleaning up before using, also hosting will >>> need some resources, ...) - and so my efforts have gone nowhere . >>> >>> >>> Would it not be nice if Wikipedia could jump in and support research by >>> going the whole mile and making those page counts available? >>> >>> In regard to the prioritizing - I am sure you have a long backlog - I >>> would argue that this is something that really is a multiplier thing. It >>> enables a lot of people to start researching. Daily page counts are not >>> that fancy but without them people are simply blocked. They cannot start >>> because they cant even get a basic idea about what was the general article >>> popularity for a given day. >>> >>> >>> Best Peter >>> >>> >>> >>> PS.: I would be willing to put in some time to help you folks in any way >>> I can. >>> >>> >>> 2018-02-22 21:56 GMT+01:00 Dan Andreescu <[email protected]>: >>> >>>> My view had been informed by the documentation at >>>>> https://dumps.wikimedia.org/other/pagecounts-ez/: >>>>> >>>>> Hourly page views per article for around 30 million article titles >>>>>> (Sept 2013) in around 800+ Wikimedia wikis. Repackaged (with extreme >>>>>> shrinkage, without losing granularity), corrected, reformatted. Daily >>>>>> files >>>>>> and two monthly files (see notes below). >>>>> >>>>> >>>>> Regarding the claim that pagecounts-ez has data back to when wikimedia >>>>> started tracking pageviews, I'll point out another error in the >>>>> documentation that may have led to that view. The documentation claims >>>>> that >>>>> data is available from 2007 onward: >>>>> >>>>> From 2007 to May 2015: derived from Domas' pagecount/projectcount >>>>>> files >>>>> >>>>> >>>>> However, if you check out the actual files ( >>>>> https://dumps.wikimedia.org/other/pagecounts-ez/merged/), you'll see >>>>> that the pagecounts only go back to late 2011. >>>>> >>>> >>>> Ah, yes, but the projectcount files go back to 2007-12, that's where >>>> that confusion comes from, we should clarify or generate the old data. I'm >>>> not sure whether this is easy, but I think it's fairly straightforward and >>>> I've opened a task for it: https://phabricator.wikimedia.org/T188041 >>>> (we have a lot of work in our backlog, though, so we probably won't be able >>>> to get to this for a bit). >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
