If you want it going back that far, I'm afraid the stats.grok.se style data is all there is :(. The new API only covers the last few months thus far.
On 14 December 2015 at 17:31, <[email protected]> wrote: > Hi all, > > Specifically what I am looking for is page view data for these pages, > preferably for all months on http://dumps.wikimedia.org/other/pagecounts-raw/ > (appeared as named 4 Dec): > Abacarus hystrix > Acarus siro > Aceria tosichella > Acyrthosiphon pisum > Ahasverus advena > Anthrenus flavipes > Aphis craccivora > Arhopalus > Balaustium medicagoense > Bemisia tabaci > Brevicoryne brassicae > Bruchus > Ceratitis capitata > Cicadulina > Cryptolestes > Daktulosphaira vitifoliae > Delia > Ephestia elutella > Ephestia kuehniella > Etiella behrii > Frankliniella occidentalis > Frankliniella > Henosepilachna vigintioctopunctata > Heteronychus arator > Lachesilla quercus > Lasioderma serricorne > Liposcelis bostrychophila > Macrosiphum euphorbiae > Marchalina hellenica > Myzus persicae > Naupactus > Nezara viridula > Oligonychus ununguis > Oryzaephilus surinamensis > Panonychus ulmi > Penthaleus > Pieris rapae > Piezodorus > Plodia interpunctella > Plutella xylostella > Rhopalosiphon > rhopalosiphum maidis > Rhopalosiphum padi > Rhyzopertha dominica > Sirex noctilio > Sitophilus granarius > Sitophilus oryzae > Sitotroga cerealella > Sminthurus viridis > Spodoptera exempta > Stegobium paniceum > Tetranychus > Thrips palmi > Thrips > Tribolium castaneum > Tribolium confusum > Trogoderma granarium > Trogoderma > > I then also want a total number of page views to standardise the individual > page views. > > I have looked at stats.gronk.se and wikitrends and I have two issues: > 1. The data is only month by month and I want as many years of data as > possible. > 2. Some pages have too few page views for wikitrends. > > > Thanks for your help! > > > > -----Original Message----- > From: Analytics [mailto:[email protected]] On Behalf Of > [email protected] > Sent: Tuesday, 15 December 2015 4:11 AM > To: [email protected] > Subject: Analytics Digest, Vol 46, Issue 23 > > Send Analytics mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/analytics > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific than "Re: > Contents of Analytics digest..." > > > Today's Topics: > > 1. Re: Readership metrics for the fortnight until December 6, > 2015 (Federico Leva (Nemo)) > 2. Re: Data collection (Erik Zachte) > 3. Re: Data collection (Federico Leva (Nemo)) > 4. Re: Python client for the new pageview API (Dan Andreescu) > 5. Re: mobile and zero legacy tsvs on stat1002 (Oliver Keyes) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 14 Dec 2015 13:08:11 +0100 > From: "Federico Leva (Nemo)" <[email protected]> > To: A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and "analytics." > <[email protected]> > Subject: Re: [Analytics] Readership metrics for the fortnight until > December 6, 2015 > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8; format=flowed > > Interesting country breakdown! > > Tilman Bayer, 14/12/2015 12:32: >> >> For the top three, I looked at how pageviews developed on a daily >> basis during the last three month including the week after this large >> change (until Dec 6): >> >> >> In Greece, the +21.6% rise was the result of an isolated spike from >> November 23-25. This can be traced to a single page on the Greek >> Wiktionary which on most days before and after only saw a single-digit >> number of pageviews, but on these three days received more than 2.8 >> million: τάλε κουάλε >> <https://el.wiktionary.org/wiki/%CF%84%CE%AC%CE%BB%CE%B5_%CE%BA%CE%BF%CF%85%CE%AC%CE%BB%CE%B5>. >> It’s about an expression that apparently comes from Latin via Italian >> (“tale quale”) <https://en.wiktionary.org/wiki/tale_e_quale>and means >> something like “exactly the same” or “spitting image”. From the form >> of the spike, it was likely not the result of actual human interest, >> rather an undetected bot trying to learn exactly the same about exactly the >> same. >> >> >> >> In Ireland, the -20.6% drop marked the end of a plateau whose start >> had actually shown up in the report for the week until November 1 >> <https://lists.wikimedia.org/pipermail/mobile-l/2015-November/009919.h >> tml>already, where the country was the top changer with a 40.2% rise. >> >> >> For South Africa, the -20.6% drop does not form part of a clear pattern. >> > > > > ------------------------------ > > Message: 2 > Date: Mon, 14 Dec 2015 14:14:17 +0100 > From: "Erik Zachte" <[email protected]> > To: "'A mailing list for the Analytics Team at WMF and everybody who > has an interest in Wikipedia and analytics.'" > <[email protected]> > Subject: Re: [Analytics] Data collection > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > Hi Caitlin, > > > > Here is a breakdown of categories within Phytopathology on English wikipedia: > http://ow.ly/VQNVL > > and the articles within those categories ranked by page view for Oct 2015 : > http://ow.ly/VQNCv > > > > I can run similar reports for earlier months. > > > > Cheers, > > Erik > > > > > > From: Analytics [mailto:[email protected]] On Behalf Of > Alex Druk > Sent: Monday, December 14, 2015 10:44 > To: A mailing list for the Analytics Team at WMF and everybody who has an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] Data collection > > > > Hi Caitlin, > > > > If you have a list of relevant articles and understanding what time period > you would like to research, contact me of the list and I probably can help > you. > > Also my advise: have a look at wikipediatrends.com or stats.grok.se and try > some of your queries to get a better undestanding of possible results. > > Best wishes, > > > > On Mon, Dec 14, 2015 at 12:04 AM, <[email protected]> wrote: > > Hi All, > > > > I am a summer research intern with the Commonwealth Scientific and Industrial > Research Organisation (CSIRO) in Australia. I am studying a statistics degree > and so I don’t really have skills in the type of data collection required to > access the Wiki data for my research. I was wondering if someone might be > able to give me a hand (by pointing me in the right direction)? > > > > I have a list of pest species that I wish to find the total number of page > views via stats.grok.se or https://dumps.wikimedia.org/other/pagecounts-raw/ > . There must be a good method to go through and pick out page views by name > rather than by hand (which obviously isn’t feasible)? I’d also need to be > able to find the total number of page views for each period in order to > standardize the response to account for the increase in traffic over the > years. > > > > We are in the process of gathering similar data through a Plant Pest database > but due to privacy concerns, the organisation is arranging to reconcile the > data on our behalf and so I do not have a part in that. > > > > Any help would be really appreciated! > > > > Kind regards, > > Caitlin Gardner > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > -- > > Thank you. > > Alex Druk, PhD > > wikipediatrends.com > [email protected] > (775) 237-8550 Google voice > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.wikimedia.org/pipermail/analytics/attachments/20151214/9ec9b28b/attachment-0001.html> > > ------------------------------ > > Message: 3 > Date: Mon, 14 Dec 2015 15:25:03 +0100 > From: "Federico Leva (Nemo)" <[email protected]> > To: A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and "analytics." > <[email protected]> > Subject: Re: [Analytics] Data collection > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8; format=flowed > > Erik Zachte, 14/12/2015 14:14: >> I can run similar reports for earlier months. > > Thanks for publishing that code too! > https://github.com/wikimedia/analytics-wikistats/tree/master/dammit.lt/bash > > Nemo > > > > ------------------------------ > > Message: 4 > Date: Mon, 14 Dec 2015 09:32:24 -0500 > From: Dan Andreescu <[email protected]> > To: Analytics List <[email protected]>, Research into > Wikimedia content and communities > <[email protected]> > Subject: Re: [Analytics] Python client for the new pageview API > Message-ID: > <ca+aepcs4n-z4qd-wzw7v_j5aipb01ncwzrluhtbiwwy_ofc...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I wasn't aware of some conventions that came before me, so I moved the > project from milimetric/wmf to mediawiki-utilities/python-mwviews. I promise > it'll stay there, sorry for the inconvenience. Updated links: > > PyPI: https://pypi.python.org/pypi/mwviews/0.0.2 > code: https://github.com/mediawiki-utilities/python-mwviews (PRs still > welcome, thanks for the 2 you already helped with!) > > On Fri, Dec 11, 2015 at 10:36 PM, Dan Andreescu <[email protected]> > wrote: > >> Along the same lines as Oliver's great R client [1], I just started >> work on a python version: >> >> PyPI: https://pypi.python.org/pypi/wmf/0.1 >> code: https://github.com/milimetric/wmf (PRs welcome) >> >> And if you're trying to skip past all the setup repository cruft, the >> meat: >> https://github.com/milimetric/wmf/blob/master/wmf/analytics/api/pagevi >> ews.py >> >> >> [1] https://github.com/Ironholds/pageviews >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.wikimedia.org/pipermail/analytics/attachments/20151214/c0a9adf5/attachment-0001.html> > > ------------------------------ > > Message: 5 > Date: Mon, 14 Dec 2015 12:10:50 -0500 > From: Oliver Keyes <[email protected]> > To: "A mailing list for the Analytics Team at WMF and everybody who > has an interest in Wikipedia and analytics." > <[email protected]> > Subject: Re: [Analytics] mobile and zero legacy tsvs on stat1002 > Message-ID: > <CAAUQgdADvcJdt_6+PgELg0P6nxM3C=6uydv3pboxjcpffc3...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Gotcha! Long as it's set for every request, perfect :) > > On 14 December 2015 at 04:50, Joseph Allemandou <[email protected]> > wrote: >> @Oliver: I think the closest we'll have is the access-method field, >> that can take values desktop, mobile-web, mobile-app. >> >> On Sun, Dec 13, 2015 at 8:37 PM, Oliver Keyes <[email protected]> wrote: >>> >>> Not an answer to the question, but a question of my own; will the >>> nature of the content being served still be present as /some/ field? >>> FWIW I've found it very helpful to be able to use webrequest_source >>> to trivially distinguish mobile and desktop requests. >>> >>> On 11 December 2015 at 12:40, Andrew Otto <[email protected]> wrote: >>> > Hi all, >>> > >>> > Soon, we will be merging the mobile web cache requests with the >>> > text cache requests. text caches will now serve requests for >>> > mobile web[1]. >>> > >>> > This means that the webrequest_source=‘mobile’ partition in the >>> > webrequest table in Hive will soon be empty, and all data that was >>> > previously in it will be found in the webrequest_source=‘text’ >>> > partition. >>> > >>> > There are only 3 datasets that currently only use the >>> > webrequest_source=‘mobile’ partition: >>> > >>> > - /a/log/webrequest/archive/mobile >>> > - /a/log/webrequest/archive/5xx-mobile >>> > - /a/log/webrequest/archive/zero >>> > >>> > (These are paths on stat1002, but they also exist in HDFS.) >>> > >>> > These datasets originally came from udp2log, but since early last >>> > year they have been generated from Hadoop. With the upcoming cache >>> > merge, these jobs will have to parse through all text requests, >>> > which will make Hadoop busier. >>> > >>> > Do we know if these are being used? Would anyone be upset if we no >>> > longer generated these datasets? >>> > >>> > Thanks! >>> > -Andrew >>> > >>> > [1] https://phabricator.wikimedia.org/T109286 >>> > >>> > >>> > _______________________________________________ >>> > Analytics mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Count Logula >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> >> -- >> Joseph Allemandou >> Data Engineer @ Wikimedia Foundation >> IRC: joal >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > ------------------------------ > > End of Analytics Digest, Vol 46, Issue 23 > ***************************************** > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
