Dear List-eners,

I write in to argue the case for an Wikipedia effort to make something like (page views per day per article from 2007 onwards) available

I am author of the first R-package that was providing easy access to
pageview counts by accessing the service and translating the
it into need little R data frames.

Since is gone somebody writes in once a month - mostly from
academia - asking about the status of page view data for the time before
late 2015 - counts, per article, per day. To underline this further: the R
pageviews package written by one of your former colleagues has over 7000
downloads within 2 years while my package has 14000 within 4 years (which
are conservative numbers because they stem from one particular CRAN mirror

I made some efforts to reconstruct the service that was
providing but well it's not a trivial endeavour as far as I can see (BIG
data, demanding some computing time and storage resources and bandwidth,
and some thinking about how to re-arrange and aggregate the data so it can
be queried and served efficiently -  not to mention that the data is raw
meaning it needs some proper cleaning up before using, also hosting will
need some resources, ...) - and so my efforts have gone nowhere .

Would it not be nice if Wikipedia could jump in and support research by
going the whole mile and making those page counts available?

In regard to the prioritizing - I am sure you have a long backlog - I would
argue that this is something that really is a multiplier thing. It enables
a lot of people to start researching. Daily page counts are not that fancy
but without them people are simply blocked. They cannot start because they
cant even get a basic idea about what was the general article popularity
for a given day.

Best Peter

PS.: I would be willing to put in some time to help you folks in any way I

2018-02-22 21:56 GMT+01:00 Dan Andreescu <>:

> My view had been informed by the documentation at
>> Hourly page views per article for around 30 million article titles (Sept
>>> 2013) in around 800+ Wikimedia wikis. Repackaged (with extreme shrinkage,
>>> without losing granularity), corrected, reformatted. Daily files and two
>>> monthly files (see notes below).
>> Regarding the claim that pagecounts-ez has data back to when wikimedia
>> started tracking pageviews, I'll point out another error in the
>> documentation that may have led to that view. The documentation claims that
>> data is available from 2007 onward:
>>  From 2007 to May 2015: derived from Domas' pagecount/projectcount files
>> However, if you check out the actual files (
>> ther/pagecounts-ez/merged/), you'll see that the pagecounts only go back
>> to late 2011.
> Ah, yes, but the projectcount files go back to 2007-12, that's where that
> confusion comes from, we should clarify or generate the old data.  I'm not
> sure whether this is easy, but I think it's fairly straightforward and I've
> opened a task for it: (we have
> a lot of work in our backlog, though, so we probably won't be able to get
> to this for a bit).
> _______________________________________________
> Analytics mailing list
Analytics mailing list

Reply via email to