Peter:

Do submit a phabricator tasks with your request, it'll be easier to follow
on it than it is via e-mail.  Our backlog:
https://phabricator.wikimedia.org/tag/analytics/

I assume you know that per article views are available since 2015, a way to
see those:  https://tools.wmflabs.org/pageviews/

Per project views are available since early on, in either downloadable
files or programatic form:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts

Thanks,

Nuria

On Thu, Feb 22, 2018 at 1:44 PM, Peter Meissner <retep.meiss...@gmail.com>
wrote:

> Like dumps on article-day level? That would be already super awesome much
> better than the current state.
>
> Best, Peter
>
> Am 22.02.2018 22:23 schrieb "Dan Andreescu" <dandree...@wikimedia.org>:
>
>> Peter, the data you mention here is quite large, and storage is cheap but
>> not free.  For now, we don't have capacity to serve that kind of timespan
>> from the API, but we will work to improve the dumps version so it's more
>> comprehensive.
>>
>> On Thu, Feb 22, 2018 at 4:12 PM, Peter Meissner <retep.meiss...@gmail.com
>> > wrote:
>>
>>> Dear List-eners,
>>>
>>>
>>> I write in to argue the case for an Wikipedia effort to make something
>>> like stats.grok.se (page views per day per article from 2007 onwards)
>>> available again.
>>>
>>>
>>> I am author of the first R-package that was providing easy access to
>>> pageview counts by accessing the stats.grok.se service and translating
>>> the it into need little R data frames.
>>>
>>> Since stats.grok.se is gone somebody writes in once a month - mostly
>>> from academia - asking about the status of page view data for the time
>>> before late 2015 - counts, per article, per day. To underline this further:
>>> the R pageviews package written by one of your former colleagues has over
>>> 7000 downloads within 2 years while my package has 14000 within 4 years
>>> (which are conservative numbers because they stem from one particular CRAN
>>> mirror only).
>>>
>>> I made some efforts to reconstruct the service that stats.grok.se was
>>> providing but well it's not a trivial endeavour as far as I can see (BIG
>>> data, demanding some computing time and storage resources and bandwidth,
>>> and some thinking about how to re-arrange and aggregate the data so it can
>>> be queried and served efficiently -  not to mention that the data is raw
>>> meaning it needs some proper cleaning up before using, also hosting will
>>> need some resources, ...) - and so my efforts have gone nowhere .
>>>
>>>
>>> Would it not be nice if Wikipedia could jump in and support research by
>>> going the whole mile and making those page counts available?
>>>
>>> In regard to the prioritizing - I am sure you have a long backlog - I
>>> would argue that this is something that really is a multiplier thing. It
>>> enables a lot of people to start researching. Daily page counts are not
>>> that fancy but without them people are simply blocked. They cannot start
>>> because they cant even get a basic idea about what was the general article
>>> popularity for a given day.
>>>
>>>
>>> Best Peter
>>>
>>>
>>>
>>> PS.: I would be willing to put in some time to help you folks in any way
>>> I can.
>>>
>>>
>>> 2018-02-22 21:56 GMT+01:00 Dan Andreescu <dandree...@wikimedia.org>:
>>>
>>>> My view had been informed by the documentation at
>>>>> https://dumps.wikimedia.org/other/pagecounts-ez/:
>>>>>
>>>>> Hourly page views per article for around 30 million article titles
>>>>>> (Sept 2013) in around 800+ Wikimedia wikis. Repackaged (with extreme
>>>>>> shrinkage, without losing granularity), corrected, reformatted. Daily 
>>>>>> files
>>>>>> and two monthly files (see notes below).
>>>>>
>>>>>
>>>>> Regarding the claim that pagecounts-ez has data back to when wikimedia
>>>>> started tracking pageviews, I'll point out another error in the
>>>>> documentation that may have led to that view. The documentation claims 
>>>>> that
>>>>> data is available from 2007 onward:
>>>>>
>>>>>  From 2007 to May 2015: derived from Domas' pagecount/projectcount
>>>>>> files
>>>>>
>>>>>
>>>>> However, if you check out the actual files (
>>>>> https://dumps.wikimedia.org/other/pagecounts-ez/merged/), you'll see
>>>>> that the pagecounts only go back to late 2011.
>>>>>
>>>>
>>>> Ah, yes, but the projectcount files go back to 2007-12, that's where
>>>> that confusion comes from, we should clarify or generate the old data.  I'm
>>>> not sure whether this is easy, but I think it's fairly straightforward and
>>>> I've opened a task for it: https://phabricator.wikimedia.org/T188041
>>>> (we have a lot of work in our backlog, though, so we probably won't be able
>>>> to get to this for a bit).
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to