Hi, I have a batch tool which collects article titles for any category and its 
subcategories (up to an arbitrary depth), then collects the page views for 
those articles for any given month and prints a sorted list. For optimal 
results the parsed category subtree often needs manual pruning (so weird 
subcategories can be blacklisted) or category depth should be kept modest.

 

Here's an example with top category 'WikiProject_Islands': http://ow.ly/QgahV

Tool is at http://ow.ly/QgbLO

But again it's a batch tool. One would have to download a file with monthly 
pageview totals from https://dumps.wikimedia.org/other/pagecounts-ez/merged/, 
or ask me to run a occasional ad hoc query.

 

Erik

 

From: [email protected] 
[mailto:[email protected]] On Behalf Of Raymond Leonard
Sent: Wednesday, July 29, 2015 21:00
To: [email protected]; A mailing list for the Analytics Team at WMF and everybody who 
has an interest in Wikipedia and analytics.
Cc: Felipe Hoffa
Subject: Re: [Analytics] New to list; please direct me to the tool(s) that I 
can use to determine per-page views per category/WikiProject

 

Michael,

Thanks! Your solution solves a different problem than I was looking for (list 
of page views of each article over time per a WikiProject), but having a 
comprehensive list of the pages under WikiProject_Seattle will undoubtedly be 
useful as well. Heck, I am tempted to install ActivePerl on my PC (used to have 
it on an earlier PC & also had access to a couple of Macs) to write a Perl 
script to convert the results into a .csv file to load it into Excel or the 
Libre Office equivalent.

Kudos to you & the UW iSchool.


Yours,
Peaceray <https://en.wikipedia.org/wiki/User:Peaceray> 
Cascadia Wikimedians User Group <http://cascadia.wiki> 
[email protected] (redirects to)
[email protected]

 

On Wed, Jul 29, 2015 at 11:11 AM, Michael Gilbert <[email protected]> wrote:

Peaceray,

Though it should be considered largely only research ready (i.e., potentially 
unstable, so not ready for production-level always-on tools), I've created a 
service that syncs and provides details on current WikiProjects.  For example:

https://alahele.ischool.uw.edu:8997/api/getProjects

Would return all current WikiProjects, defined as all pages in the Wikipedia 
namespace (4) starting with WikiProject_*, as well as pages in the 
Active_WikiProjects category on the Wikipedia namespace, allowing for both 
projects like WikiProject Seattle as well as Department of Fun to be recorded.

To get the pages under the scope of those projects, try:

https://alahele.ischool.uw.edu:8997/api/getProjectPages?project=WikiProject_Seattle

This currently returns 6,961 pages, including all pages under the project 
category as well as all sub-category pages to a depth of 2. It's possible 
something like this could be paired with stats.grok.se or the upcoming pageview 
API to get page view data on each of the Seattle-related articles (i.e., 
http://stats.grok.se/en/201507/Bitter_Lake,_Seattle), or, if you're looking for 
activity information you can get it from a separate request, below (which would 
return edits to the Space Needle page in the Article and Article Talk 
namespace, grouped by page, editor, and week, after March 5th, 2014):

https://alahele.ischool.uw.edu:8997/api/getEdits?page=Space_Needle 
<https://alahele.ischool.uw.edu:8997/api/getEdits?page=Space_Needle&namespace=0>
 &namespace=0|1&group=page|user|date&sd=20140305

(Rough) documentation and all the bits for the above are at: 
https://github.com/mdgilbert/wiki-tools (for instance, the syncProjects.py 
script is what collects the project and project-pages data). 
(Also rough) documentation and the code for the node.js server which provides 
the data is at: https://github.com/mdgilbert/node-reflex

Any comments, suggestions, requests, etc always welcome. Cheers,

Michael Gilbert
Human Centered Design & Engineering, University of Washington

 

On 07/29/2015 09:59 AM, Raymond Leonard wrote:

Hi Dan,

I did discover the TreeViews tool a couple of days ago on tools.wmflabs.org:

http://tools.wmflabs.org/glamtools/treeviews/?q={%22rows%22%3A[{%22title%22%3A%22WikiProject%20Seattle%20articles%22}]}
 
<http://tools.wmflabs.org/glamtools/treeviews/?q=%7B%22rows%22%3A%5b%7B%22title%22%3A%22WikiProject%20Seattle%20articles%22%7D%5d%7D>
 

However, for Category:WikiProject Seattle article, it only brings back the 
articles 10 Things I Hate About You 
<http://en.wikipedia.org/wiki/10_Things_I_Hate_About_You>  through Ballard 
Carnegie Library <http://en.wikipedia.org/wiki/Ballard_Carnegie_Library> , 
which is a little over 1100 articles, whereas there are 6,882 in the category 
alone (as of the time of this email), let alone subcategories. It may be that 
there is a limit as to the number of articles that the tool can pull monthly 
page views for, but it does not state that.

Do you know who developed TreeViews & how I can contact her/him/them?


Yours,

Peaceray <https://en.wikipedia.org/wiki/User:Peaceray> 
Cascadia Wikimedians User Group <http://cascadia.wiki> 
[email protected] (redirects to)
[email protected]

 

On Wed, Jul 29, 2015 at 8:28 AM, Dan Andreescu <[email protected]> wrote:

Hi Raymond.  Currently we don't have any WMF-hosted tools that will let you get 
this information easily.  We have committed to deliver a Pageview API by the 
end of this quarter [1].  The first version will not have per-category totals, 
but it will have per-article totals.  Until then, there are community-built 
tools such as: 

 

http://stats.grok.se (not updated for a while)

https://www.vitribyte.com/ (great dashboarding features but the future of the 
project is not determined yet)

 

Google Big Query has also ingested our hourly pageview dumps, I've cc-ed Felipe 
Hoffa so he can provide details on that.

 

The main problem with the solutions above is that they're based on an out-dated 
pageview definition that's been having more and more problems lately.  The 
Pageview API we are shipping at the end of this quarter will be based on higher 
quality data that makes an effort to detect spiders and normalize page titles 
across different access methods (API requests from mobile apps, different 
accents, etc).  Preliminary tests show that this data does not have the 
anomalies we've seen in the old data.

 

[1] if you're interested in following along or helping with this project, you 
can find it by searching for {slug} in our backlog 
<https://phabricator.wikimedia.org/tag/analytics-backlog/>  and kanban 
<https://phabricator.wikimedia.org/tag/analytics-kanban/>  task boards.

 

On Sun, Jul 26, 2015 at 3:21 PM, Raymond Leonard 
<[email protected]> wrote:

Hello,

I am new to this list. I am looking to rejuvenate a semi-active WikiProject & 
am looking for a tool or tools that will list the frequency of individual 
per-page views for a given category/WikiProject. The time period could be 
preset to a period of time or specifiable --- my guess is that this may depend 
upon the particular tool(s).

We wish to use this as one of the inputs to determining the importance of an 
article to the WikiProject.

 

Please feel free to email me directly if you wish to avoid adding traffic to 
the mail list.

Yours,

Peaceray <https://en.wikipedia.org/wiki/User:Peaceray> 

Cascadia Wikimedians User Group <http://cascadia.wiki> 

[email protected] (redirects to)

[email protected]

 

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 

 

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to