+1 - I just crashed my spreadsheet trying to open one .tsv file. But great
news indeed Erik - this is an important first step!

On Tue, Mar 24, 2015 at 8:42 PM, Hay (Husky) <[email protected]> wrote:

> Awesome! I'm especially glad that more statistics than 'just' the
> image views are included, like the aggregated views for thumbnails,
> and the media files as well. I just hope somebody will built a tool in
> the near future like stats.grok.se so we can view statistics for
> individual files and/or sets of files a la Bagalama2.
>
> -- Hay
>
> On Tue, Mar 24, 2015 at 6:39 PM, Erik Zachte <[email protected]>
> wrote:
> > Today WMF Analytics announces a new product: a daily feed of media file
> > request counts for all Wikimedia projects [1].
> >
> > The counts are based on unsampled data, so any single request within the
> > defined scope [2] will contribute to the counts.
> >
> > It can be seen as complimentary to our page view counts files [5].
> >
> > The file layout is documented on wikitech [3].
> >
> > Daily counts have been backfilled from January 1, 2015 onwards.
> >
> >
> >
> > Additionally there is a daily zip file which contains a small subset of
> > these raw counts: top 1000 most requested media files, one csv file for
> each
> > column [7]. As these csv files have headers (not so easy to add in Hive)
> you
> > may want to start with this file for a first impression (best open in
> > spreadsheet program).
> >
> >
> >
> > The counts are collected from our Hadoop system, using a Hive query, with
> > data markup done in UDF scripts. This feed hopefully addresses a long
> > standing request, expressed often and by many, which we regrettably
> couldn't
> > fulfil earlier, as our pre-Hadoop infrastructure and processing capacity
> > were not up to the task.
> >
> >
> >
> > An initial draft design (RFC) was presented last November at the
> Amsterdam
> > Hackaton 2014 (GLAM and Wikidata).
> >
> > Online consultation followed, leading to the current design [4].
> >
> >
> >
> > This is a data feed with production status, but not the final release, as
> > there is one major issue that hasn't been addressed yet (but progress is
> > being made):
> >
> > When using Media viewer to view images, some images are prefetched for
> > better user experience, but these may never be shown to the user.
> Currently,
> > those prefetched images are getting counted, as there is no way to detect
> > whether an image was actually shown to the user or not.
> >
> > Gilles Dubuc and other colleagues worked on a solution that would not
> hamper
> > performance (a tough challenge) and would help us discern viewed from
> > non-viewed files. A few days ago a patch was published! Adaptation of the
> > Hive query will follow later. [6] Also, and related, context tagging
> isn't
> > supported yet. [9]
> >
> >
> >
> > Huge thanks to all people who contributed to the process so far, and
> still
> > do.
> >
> > Special thanks to Christian Aistleitner with whom I co-authored the
> design,
> > and who also wrote the Hive implementation.
> >
> >
> >
> > Erik Zachte
> >
> >
> >
> > [1] http://dumps.wikimedia.org/other/mediacounts/
> >
> > [2]
> >
> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts#Filtering
> >
> > [3] https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts
> >
> > [4]
> >
> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
> >
> > [5]
> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pagecounts-all-sites
> >
> >       (a new version of this data feed is in the works)
> >
> > [6] https://phabricator.wikimedia.org/T89088
> >
> > [7] Before you ask: no plans yet for further aggregation into monthly or
> > yearly top ranking files. The current csv files are quick wins, using
> > standard Linux tools.
> >
> > [8] https://www.mediawiki.org/wiki/Multimedia/Media_Viewer
> >
> > [9]
> >
> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts#by_context
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to