+ Analytics, our public analytics related mailing list [1]

Hi Jeff,

Let me give it a try:

* Re pageviews: a lot has changed since the Kaggle contest days you
refer to. :) I highly recommend you check out
https://dumps.wikimedia.org/other/pagecounts-ez/ where our hourly
pageviews per article live. In case you need it, abbreviations used in
the file names are documented. [2]

* Can you expand more what you are trying to do? The short answer for
your category related question is that you have to parse XML dumps,
but we may have some good pointers for you to save you from that. If
you tell us more, we're more likely to be able to help.

* And, if you decide to continue research on Wiki(m|p)edia data (which
I hope you do:), consider signing up in our public research list at
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Best,
Leila

[1] https://lists.wikimedia.org/mailman/listinfo/analytics
[2] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageviews

--
Leila Zia
Senior Research Scientist, Lead
Wikimedia Foundation


On Wed, May 23, 2018 at 3:22 PM, Wikimedia Answers
<[email protected]> wrote:
> Forwarding for your evaluation :) Feel free to include the wider Research
> team.
>
> best,
> Joe
>
> ---------- Forwarded message ----------
> From: Jeffrey Levesque <[email protected]>
> Date: Tue, May 22, 2018 at 7:48 AM
> Subject: Re: Jeff Levesque: List of Articles By Categories (College Project)
> To: "[email protected]" <[email protected]>
> Cc: "[email protected]" <[email protected]>
>
>
> Hi,
> Is there a known API, where I can supply the article name, and attain the
> corresponding "category" the article belongs to? I'm thinking I could write
> a python script and iterate the kaggle dataset, then send some POST request
> to hopefully some existing API, to determine the articles "category".
>
> Thank you,
>
> Jeff Levesque
> https://github.com/jeff1evesque
>
> On May 22, 2018, at 10:37 AM, Jeffrey Levesque <[email protected]> wrote:
>
> Hi,
> Do you guys have a more recent time series of Wikipedia article traffic. I'm
> noticing that the kaggle dataset does not have a lot of articles that are on
> Wikipedia. Do you guys have a good idea of how I can categorize the dataset
> I have?
>
> Thank you,
>
> Jeff Levesque
> https://github.com/jeff1evesque
>
> On May 22, 2018, at 8:40 AM, Jeffrey Levesque <[email protected]> wrote:
>
> Hi,
>
> I am masters student at Syracuse University. For my data science class, I am
> doing a project trying to analyze traffic patterns for Wikipedia. I’ve
> attained the Kaggle dataset for 2015-2016 data:
>
>
>
> https://www.kaggle.com/headsortails/wiki-traffic-forecast-exploration-wtf-eda/data
>
>
>
> However, the dataset only provides the frequency of visits to particular
> pages on a given day. Could I request to attain a list of articles grouped
> by “Categories”? I’ve tried to use the API (i.e.
> https://en.wikipedia.org/wiki/Special:Export). But, that doesn’t seem to
> generate a full output. Additionally, in the list it supplies subcategories.
> So, I tried using the URL API (i.e.
> https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics&format=json).
> But, that also seems to return an even shorter result set:
>
>
>
> {"batchcomplete":"","continue":{"cmcontinue":"page|2d2941313f2b292d3d0447454f31434f39293f011701dc16|55503653","continue":"-||"},"query":{"categorymembers":[{"pageid":22939,"ns":0,"title":"Physics"},{"pageid":24489,"ns":0,"title":"Outline
> of physics"},{"pageid":3445246,"ns":0,"title":"Glossary of classical
> physics"},{"pageid":1653925,"ns":100,"title":"Portal:Physics"},{"pageid":50926902,"ns":0,"title":"Action
> angle
> coordinates"},{"pageid":9079863,"ns":0,"title":"Aerometer"},{"pageid":52657328,"ns":0,"title":"Bayesian
> model of computational anatomy"},{"pageid":49342572,"ns":0,"title":"Group
> actions in computational
> anatomy"},{"pageid":50724262,"ns":0,"title":"Blasius\u2013Chaplygin
> formula"},{"pageid":33327002,"ns":0,"title":"Cabbeling"}]}}
>
>
>
>
>
> Thank you,
>
> Jeff Levesque
>
> (603) 969-5363
>
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to