+1 - I just crashed my spreadsheet trying to open one .tsv file. But great news indeed Erik - this is an important first step!
On Tue, Mar 24, 2015 at 8:42 PM, Hay (Husky) <[email protected]> wrote: > Awesome! I'm especially glad that more statistics than 'just' the > image views are included, like the aggregated views for thumbnails, > and the media files as well. I just hope somebody will built a tool in > the near future like stats.grok.se so we can view statistics for > individual files and/or sets of files a la Bagalama2. > > -- Hay > > On Tue, Mar 24, 2015 at 6:39 PM, Erik Zachte <[email protected]> > wrote: > > Today WMF Analytics announces a new product: a daily feed of media file > > request counts for all Wikimedia projects [1]. > > > > The counts are based on unsampled data, so any single request within the > > defined scope [2] will contribute to the counts. > > > > It can be seen as complimentary to our page view counts files [5]. > > > > The file layout is documented on wikitech [3]. > > > > Daily counts have been backfilled from January 1, 2015 onwards. > > > > > > > > Additionally there is a daily zip file which contains a small subset of > > these raw counts: top 1000 most requested media files, one csv file for > each > > column [7]. As these csv files have headers (not so easy to add in Hive) > you > > may want to start with this file for a first impression (best open in > > spreadsheet program). > > > > > > > > The counts are collected from our Hadoop system, using a Hive query, with > > data markup done in UDF scripts. This feed hopefully addresses a long > > standing request, expressed often and by many, which we regrettably > couldn't > > fulfil earlier, as our pre-Hadoop infrastructure and processing capacity > > were not up to the task. > > > > > > > > An initial draft design (RFC) was presented last November at the > Amsterdam > > Hackaton 2014 (GLAM and Wikidata). > > > > Online consultation followed, leading to the current design [4]. > > > > > > > > This is a data feed with production status, but not the final release, as > > there is one major issue that hasn't been addressed yet (but progress is > > being made): > > > > When using Media viewer to view images, some images are prefetched for > > better user experience, but these may never be shown to the user. > Currently, > > those prefetched images are getting counted, as there is no way to detect > > whether an image was actually shown to the user or not. > > > > Gilles Dubuc and other colleagues worked on a solution that would not > hamper > > performance (a tough challenge) and would help us discern viewed from > > non-viewed files. A few days ago a patch was published! Adaptation of the > > Hive query will follow later. [6] Also, and related, context tagging > isn't > > supported yet. [9] > > > > > > > > Huge thanks to all people who contributed to the process so far, and > still > > do. > > > > Special thanks to Christian Aistleitner with whom I co-authored the > design, > > and who also wrote the Hive implementation. > > > > > > > > Erik Zachte > > > > > > > > [1] http://dumps.wikimedia.org/other/mediacounts/ > > > > [2] > > > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts#Filtering > > > > [3] https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts > > > > [4] > > > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts > > > > [5] > https://wikitech.wikimedia.org/wiki/Analytics/Data/Pagecounts-all-sites > > > > (a new version of this data feed is in the works) > > > > [6] https://phabricator.wikimedia.org/T89088 > > > > [7] Before you ask: no plans yet for further aggregation into monthly or > > yearly top ranking files. The current csv files are quick wins, using > > standard Linux tools. > > > > [8] https://www.mediawiki.org/wiki/Multimedia/Media_Viewer > > > > [9] > > > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts#by_context > > > > > > > > > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
