Thanks, Scott, I failed to find that task and incorrectly assumed we had
declined it.  My fault, we'll see about loading that data then.

And yes, Peter, per-article dumps are already there but they're split
across pagecounts-raw from 2008-2011 and pagecounts-ez after that.  The
conversation before you posted was that we would try to get pagecounts-ez
to include all available history on a per-article level.  Since
pagecounts-ez is the most convenient and fast way to get to this data.

On Thu, Feb 22, 2018 at 6:31 PM, Scott Hale <computermacgy...@gmail.com>
wrote:

> FYI that there is a phabricator task to load legacy pagecounts by article
> to AQS:
> https://phabricator.wikimedia.org/T173720
>
> That task arose from a discussion on this mailing list mid-last year:
> https://www.mail-archive.com/analytics@lists.wikimedia.org/msg04349.html
> https://www.mail-archive.com/analytics@lists.wikimedia.org/msg04350.html
>
> Cheers,
> Scott
>
>
>
> On Thu, Feb 22, 2018 at 11:25 PM, Nuria Ruiz <nu...@wikimedia.org> wrote:
>
>> Peter:
>>
>> Do submit a phabricator tasks with your request, it'll be easier to
>> follow on it than it is via e-mail.  Our backlog: https://phabricator.w
>> ikimedia.org/tag/analytics/
>>
>> I assume you know that per article views are available since 2015, a way
>> to see those:  https://tools.wmflabs.org/pageviews/
>>
>> Per project views are available since early on, in either downloadable
>> files or programatic form: https://wikitech.wikimed
>> ia.org/wiki/Analytics/AQS/Legacy_Pagecounts
>>
>> Thanks,
>>
>> Nuria
>>
>> On Thu, Feb 22, 2018 at 1:44 PM, Peter Meissner <retep.meiss...@gmail.com
>> > wrote:
>>
>>> Like dumps on article-day level? That would be already super awesome
>>> much better than the current state.
>>>
>>> Best, Peter
>>>
>>> Am 22.02.2018 22:23 schrieb "Dan Andreescu" <dandree...@wikimedia.org>:
>>>
>>>> Peter, the data you mention here is quite large, and storage is cheap
>>>> but not free.  For now, we don't have capacity to serve that kind of
>>>> timespan from the API, but we will work to improve the dumps version so
>>>> it's more comprehensive.
>>>>
>>>> On Thu, Feb 22, 2018 at 4:12 PM, Peter Meissner <
>>>> retep.meiss...@gmail.com> wrote:
>>>>
>>>>> Dear List-eners,
>>>>>
>>>>>
>>>>> I write in to argue the case for an Wikipedia effort to make something
>>>>> like stats.grok.se (page views per day per article from 2007 onwards)
>>>>> available again.
>>>>>
>>>>>
>>>>> I am author of the first R-package that was providing easy access to
>>>>> pageview counts by accessing the stats.grok.se service and
>>>>> translating the it into need little R data frames.
>>>>>
>>>>> Since stats.grok.se is gone somebody writes in once a month - mostly
>>>>> from academia - asking about the status of page view data for the time
>>>>> before late 2015 - counts, per article, per day. To underline this 
>>>>> further:
>>>>> the R pageviews package written by one of your former colleagues has over
>>>>> 7000 downloads within 2 years while my package has 14000 within 4 years
>>>>> (which are conservative numbers because they stem from one particular CRAN
>>>>> mirror only).
>>>>>
>>>>> I made some efforts to reconstruct the service that stats.grok.se was
>>>>> providing but well it's not a trivial endeavour as far as I can see (BIG
>>>>> data, demanding some computing time and storage resources and bandwidth,
>>>>> and some thinking about how to re-arrange and aggregate the data so it can
>>>>> be queried and served efficiently -  not to mention that the data is raw
>>>>> meaning it needs some proper cleaning up before using, also hosting will
>>>>> need some resources, ...) - and so my efforts have gone nowhere .
>>>>>
>>>>>
>>>>> Would it not be nice if Wikipedia could jump in and support research
>>>>> by going the whole mile and making those page counts available?
>>>>>
>>>>> In regard to the prioritizing - I am sure you have a long backlog - I
>>>>> would argue that this is something that really is a multiplier thing. It
>>>>> enables a lot of people to start researching. Daily page counts are not
>>>>> that fancy but without them people are simply blocked. They cannot start
>>>>> because they cant even get a basic idea about what was the general article
>>>>> popularity for a given day.
>>>>>
>>>>>
>>>>> Best Peter
>>>>>
>>>>>
>>>>>
>>>>> PS.: I would be willing to put in some time to help you folks in any
>>>>> way I can.
>>>>>
>>>>>
>>>>> 2018-02-22 21:56 GMT+01:00 Dan Andreescu <dandree...@wikimedia.org>:
>>>>>
>>>>>> My view had been informed by the documentation at
>>>>>>> https://dumps.wikimedia.org/other/pagecounts-ez/:
>>>>>>>
>>>>>>> Hourly page views per article for around 30 million article titles
>>>>>>>> (Sept 2013) in around 800+ Wikimedia wikis. Repackaged (with extreme
>>>>>>>> shrinkage, without losing granularity), corrected, reformatted. Daily 
>>>>>>>> files
>>>>>>>> and two monthly files (see notes below).
>>>>>>>
>>>>>>>
>>>>>>> Regarding the claim that pagecounts-ez has data back to when
>>>>>>> wikimedia started tracking pageviews, I'll point out another error in 
>>>>>>> the
>>>>>>> documentation that may have led to that view. The documentation claims 
>>>>>>> that
>>>>>>> data is available from 2007 onward:
>>>>>>>
>>>>>>>  From 2007 to May 2015: derived from Domas' pagecount/projectcount
>>>>>>>> files
>>>>>>>
>>>>>>>
>>>>>>> However, if you check out the actual files (
>>>>>>> https://dumps.wikimedia.org/other/pagecounts-ez/merged/), you'll
>>>>>>> see that the pagecounts only go back to late 2011.
>>>>>>>
>>>>>>
>>>>>> Ah, yes, but the projectcount files go back to 2007-12, that's where
>>>>>> that confusion comes from, we should clarify or generate the old data.  
>>>>>> I'm
>>>>>> not sure whether this is easy, but I think it's fairly straightforward 
>>>>>> and
>>>>>> I've opened a task for it: https://phabricator.wikimedia.org/T188041
>>>>>> (we have a lot of work in our backlog, though, so we probably won't be 
>>>>>> able
>>>>>> to get to this for a bit).
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> Analytics@lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Dr Scott A. Hale
> http://scott.hale.us
> computermacgy...@gmail.com
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to