Hi all, Specifically what I am looking for is page view data for these pages, preferably for all months on http://dumps.wikimedia.org/other/pagecounts-raw/ (appeared as named 4 Dec): Abacarus hystrix Acarus siro Aceria tosichella Acyrthosiphon pisum Ahasverus advena Anthrenus flavipes Aphis craccivora Arhopalus Balaustium medicagoense Bemisia tabaci Brevicoryne brassicae Bruchus Ceratitis capitata Cicadulina Cryptolestes Daktulosphaira vitifoliae Delia Ephestia elutella Ephestia kuehniella Etiella behrii Frankliniella occidentalis Frankliniella Henosepilachna vigintioctopunctata Heteronychus arator Lachesilla quercus Lasioderma serricorne Liposcelis bostrychophila Macrosiphum euphorbiae Marchalina hellenica Myzus persicae Naupactus Nezara viridula Oligonychus ununguis Oryzaephilus surinamensis Panonychus ulmi Penthaleus Pieris rapae Piezodorus Plodia interpunctella Plutella xylostella Rhopalosiphon rhopalosiphum maidis Rhopalosiphum padi Rhyzopertha dominica Sirex noctilio Sitophilus granarius Sitophilus oryzae Sitotroga cerealella Sminthurus viridis Spodoptera exempta Stegobium paniceum Tetranychus Thrips palmi Thrips Tribolium castaneum Tribolium confusum Trogoderma granarium Trogoderma
I then also want a total number of page views to standardise the individual page views. I have looked at stats.gronk.se and wikitrends and I have two issues: 1. The data is only month by month and I want as many years of data as possible. 2. Some pages have too few page views for wikitrends. Thanks for your help! -----Original Message----- From: Analytics [mailto:[email protected]] On Behalf Of [email protected] Sent: Tuesday, 15 December 2015 4:11 AM To: [email protected] Subject: Analytics Digest, Vol 46, Issue 23 Send Analytics mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/analytics or, via email, send a message with subject or body 'help' to [email protected] You can reach the person managing the list at [email protected] When replying, please edit your Subject line so it is more specific than "Re: Contents of Analytics digest..." Today's Topics: 1. Re: Readership metrics for the fortnight until December 6, 2015 (Federico Leva (Nemo)) 2. Re: Data collection (Erik Zachte) 3. Re: Data collection (Federico Leva (Nemo)) 4. Re: Python client for the new pageview API (Dan Andreescu) 5. Re: mobile and zero legacy tsvs on stat1002 (Oliver Keyes) ---------------------------------------------------------------------- Message: 1 Date: Mon, 14 Dec 2015 13:08:11 +0100 From: "Federico Leva (Nemo)" <[email protected]> To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and "analytics." <[email protected]> Subject: Re: [Analytics] Readership metrics for the fortnight until December 6, 2015 Message-ID: <[email protected]> Content-Type: text/plain; charset=utf-8; format=flowed Interesting country breakdown! Tilman Bayer, 14/12/2015 12:32: > > For the top three, I looked at how pageviews developed on a daily > basis during the last three month including the week after this large > change (until Dec 6): > > > In Greece, the +21.6% rise was the result of an isolated spike from > November 23-25. This can be traced to a single page on the Greek > Wiktionary which on most days before and after only saw a single-digit > number of pageviews, but on these three days received more than 2.8 > million: τάλε κουάλε > <https://el.wiktionary.org/wiki/%CF%84%CE%AC%CE%BB%CE%B5_%CE%BA%CE%BF%CF%85%CE%AC%CE%BB%CE%B5>. > It’s about an expression that apparently comes from Latin via Italian > (“tale quale”) <https://en.wiktionary.org/wiki/tale_e_quale>and means > something like “exactly the same” or “spitting image”. From the form > of the spike, it was likely not the result of actual human interest, > rather an undetected bot trying to learn exactly the same about exactly the > same. > > > > In Ireland, the -20.6% drop marked the end of a plateau whose start > had actually shown up in the report for the week until November 1 > <https://lists.wikimedia.org/pipermail/mobile-l/2015-November/009919.h > tml>already, where the country was the top changer with a 40.2% rise. > > > For South Africa, the -20.6% drop does not form part of a clear pattern. > ------------------------------ Message: 2 Date: Mon, 14 Dec 2015 14:14:17 +0100 From: "Erik Zachte" <[email protected]> To: "'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.'" <[email protected]> Subject: Re: [Analytics] Data collection Message-ID: <[email protected]> Content-Type: text/plain; charset="utf-8" Hi Caitlin, Here is a breakdown of categories within Phytopathology on English wikipedia: http://ow.ly/VQNVL and the articles within those categories ranked by page view for Oct 2015 : http://ow.ly/VQNCv I can run similar reports for earlier months. Cheers, Erik From: Analytics [mailto:[email protected]] On Behalf Of Alex Druk Sent: Monday, December 14, 2015 10:44 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Data collection Hi Caitlin, If you have a list of relevant articles and understanding what time period you would like to research, contact me of the list and I probably can help you. Also my advise: have a look at wikipediatrends.com or stats.grok.se and try some of your queries to get a better undestanding of possible results. Best wishes, On Mon, Dec 14, 2015 at 12:04 AM, <[email protected]> wrote: Hi All, I am a summer research intern with the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia. I am studying a statistics degree and so I don’t really have skills in the type of data collection required to access the Wiki data for my research. I was wondering if someone might be able to give me a hand (by pointing me in the right direction)? I have a list of pest species that I wish to find the total number of page views via stats.grok.se or https://dumps.wikimedia.org/other/pagecounts-raw/ . There must be a good method to go through and pick out page views by name rather than by hand (which obviously isn’t feasible)? I’d also need to be able to find the total number of page views for each period in order to standardize the response to account for the increase in traffic over the years. We are in the process of gathering similar data through a Plant Pest database but due to privacy concerns, the organisation is arranging to reconcile the data on our behalf and so I do not have a part in that. Any help would be really appreciated! Kind regards, Caitlin Gardner _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics -- Thank you. Alex Druk, PhD wikipediatrends.com [email protected] (775) 237-8550 Google voice -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.wikimedia.org/pipermail/analytics/attachments/20151214/9ec9b28b/attachment-0001.html> ------------------------------ Message: 3 Date: Mon, 14 Dec 2015 15:25:03 +0100 From: "Federico Leva (Nemo)" <[email protected]> To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and "analytics." <[email protected]> Subject: Re: [Analytics] Data collection Message-ID: <[email protected]> Content-Type: text/plain; charset=utf-8; format=flowed Erik Zachte, 14/12/2015 14:14: > I can run similar reports for earlier months. Thanks for publishing that code too! https://github.com/wikimedia/analytics-wikistats/tree/master/dammit.lt/bash Nemo ------------------------------ Message: 4 Date: Mon, 14 Dec 2015 09:32:24 -0500 From: Dan Andreescu <[email protected]> To: Analytics List <[email protected]>, Research into Wikimedia content and communities <[email protected]> Subject: Re: [Analytics] Python client for the new pageview API Message-ID: <ca+aepcs4n-z4qd-wzw7v_j5aipb01ncwzrluhtbiwwy_ofc...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" I wasn't aware of some conventions that came before me, so I moved the project from milimetric/wmf to mediawiki-utilities/python-mwviews. I promise it'll stay there, sorry for the inconvenience. Updated links: PyPI: https://pypi.python.org/pypi/mwviews/0.0.2 code: https://github.com/mediawiki-utilities/python-mwviews (PRs still welcome, thanks for the 2 you already helped with!) On Fri, Dec 11, 2015 at 10:36 PM, Dan Andreescu <[email protected]> wrote: > Along the same lines as Oliver's great R client [1], I just started > work on a python version: > > PyPI: https://pypi.python.org/pypi/wmf/0.1 > code: https://github.com/milimetric/wmf (PRs welcome) > > And if you're trying to skip past all the setup repository cruft, the > meat: > https://github.com/milimetric/wmf/blob/master/wmf/analytics/api/pagevi > ews.py > > > [1] https://github.com/Ironholds/pageviews > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.wikimedia.org/pipermail/analytics/attachments/20151214/c0a9adf5/attachment-0001.html> ------------------------------ Message: 5 Date: Mon, 14 Dec 2015 12:10:50 -0500 From: Oliver Keyes <[email protected]> To: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics." <[email protected]> Subject: Re: [Analytics] mobile and zero legacy tsvs on stat1002 Message-ID: <CAAUQgdADvcJdt_6+PgELg0P6nxM3C=6uydv3pboxjcpffc3...@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 Gotcha! Long as it's set for every request, perfect :) On 14 December 2015 at 04:50, Joseph Allemandou <[email protected]> wrote: > @Oliver: I think the closest we'll have is the access-method field, > that can take values desktop, mobile-web, mobile-app. > > On Sun, Dec 13, 2015 at 8:37 PM, Oliver Keyes <[email protected]> wrote: >> >> Not an answer to the question, but a question of my own; will the >> nature of the content being served still be present as /some/ field? >> FWIW I've found it very helpful to be able to use webrequest_source >> to trivially distinguish mobile and desktop requests. >> >> On 11 December 2015 at 12:40, Andrew Otto <[email protected]> wrote: >> > Hi all, >> > >> > Soon, we will be merging the mobile web cache requests with the >> > text cache requests. text caches will now serve requests for >> > mobile web[1]. >> > >> > This means that the webrequest_source=‘mobile’ partition in the >> > webrequest table in Hive will soon be empty, and all data that was >> > previously in it will be found in the webrequest_source=‘text’ >> > partition. >> > >> > There are only 3 datasets that currently only use the >> > webrequest_source=‘mobile’ partition: >> > >> > - /a/log/webrequest/archive/mobile >> > - /a/log/webrequest/archive/5xx-mobile >> > - /a/log/webrequest/archive/zero >> > >> > (These are paths on stat1002, but they also exist in HDFS.) >> > >> > These datasets originally came from udp2log, but since early last >> > year they have been generated from Hadoop. With the upcoming cache >> > merge, these jobs will have to parse through all text requests, >> > which will make Hadoop busier. >> > >> > Do we know if these are being used? Would anyone be upset if we no >> > longer generated these datasets? >> > >> > Thanks! >> > -Andrew >> > >> > [1] https://phabricator.wikimedia.org/T109286 >> > >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > -- > Joseph Allemandou > Data Engineer @ Wikimedia Foundation > IRC: joal > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Count Logula Wikimedia Foundation ------------------------------ Subject: Digest Footer _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics ------------------------------ End of Analytics Digest, Vol 46, Issue 23 ***************************************** _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
