Hi Burton --

Thanks for this. I'm glad the Wikipedia data is useful, even if it's
difficult to access at this time.

As Nemo reported, we're currently working with Henrik to get him a better
server and it should be on it's way to him now. We're hopeful that modern
hardware and SSDs will really help scale the service.

We're also planning on working with Henrik to see if there are any
optimizations in the app/database that will help. (We have one of our DBAs
signed up to help here)

It's also exciting to see other projects come up that address this issue --
we have some major tasks ahead of us in updating the page view definitions
and making them available in a scalable way. While we haven't decided what
format we want to use, integrating with existing page view APIs is
something we want to be able to support.

You can take a look at the projects we are working on
here<https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning>;
we will be doing some prioritization next week for the new quarter and I'll
update this list with the results.

-Toby


On Mon, Mar 24, 2014 at 3:40 PM, Burton DeWilde <
[email protected]> wrote:

> Dear Toby,
>
> I recently saw your comment on a blog 
> post<http://magnusmanske.de/wordpress/?p=173>by Magnus Manske regarding the 
> lack of Wikipedia page view data besides the
> oft-overloaded http://stats.grok.se/. I was wondering if there's been any
> progress at WMF on building a more stable, central, and complete source for
> this data?
>
> I ask because I'm a data scientist at a small research non-profit called 
> Harmony
> Institute <http://harmony-institute.org/>, where we study the social
> impact of media (primarily television and film). I'm currently building an
> interactive web app <http://harmony-institute.org/work/impactspace/> that
> visualizes social impact on a variety of issues by many documentary films.
> One indicator of interest is "information-seeking behavior," i.e. are
> audiences seeking out information about a film or issue. Besides Google
> search trends, an excellent proxy for this is Wikipedia page views for both
> film pages, e.g. Escape 
> Fire<http://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare>,
> and issue-related pages, e.g. Health care 
> reform<http://en.wikipedia.org/wiki/Health_care_reform>
> .
>
> I'm currently trying to use stats.grok.se to grab raw data in JSON form;
> unfortunately, the site almost always responds with "Server overloaded,
> please throttle your requests," and no amount of throttling seems to
> suffice. I'm aware that there are many TBs of raw data for the downloading,
> but I don't have the resources to handle that much data, nor do I need more
> than the tiniest fraction of it.
>
> I would *love* to show Wikipedia page view statistics for film pages in
> our app. If you have any updates on progress or suggestions on how I might
> do this, I would be very appreciative.
>
> Thanks very much for your and all of WMF's hard work -- I'm a proud donor
> to the cause. :)
>
> Best,
> Burton DeWilde
>
> --
> Burton DeWilde
>
> Data Scientist
> Harmony Institute
> harmony-institute.org
> blog <http://harmony-institute.org/therippleeffect/> | 
> twitter<https://twitter.com/hinstitute>|
> facebook <https://www.facebook.com/harmonyinstitute>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to