See also https://phabricator.wikimedia.org/T117945 and
https://phabricator.wikimedia.org/T108867 for possibly related
oddities in the top viewed pages. (And
https://phabricator.wikimedia.org/T104755 : "Wikimedia's URL-routing
logic straddles five layers ...")

(switching CC to the intended Dan)

On Fri, Jan 22, 2016 at 3:17 PM, Ryan Kaldari <[email protected]> wrote:
> Any idea why the most popular article in India is "-"? CCing Dan Garry of
> Discovery team.
>
> On Fri, Jan 22, 2016 at 5:13 PM, Tilman Bayer <[email protected]> wrote:
>>
>> Below is an example Hive query yielding the 50 most viewed pages in
>> India during December 2015. It took less than 10 minutes of wall clock
>> time to complete.
>>
>> SELECT CONCAT('https://',project,'.org/wiki/',page_title),
>> SUM(view_count) AS views
>> FROM wmf.pageview_hourly
>> WHERE
>>    year = 2015
>>    AND month = 12
>>    AND country = "India"
>>    AND agent_type = "user"
>> GROUP BY project, page_title
>> ORDER BY views DESC LIMIT 50;
>>
>> ...
>> Total MapReduce CPU Time Spent: 0 days 19 hours 13 minutes 2 seconds 930
>> msec
>> OK
>> _c0 views
>> https://en.wikipedia.org/wiki/Main_Page 43515253
>> https://en.wikipedia.org/wiki/Special:Search 4818687
>> https://en.wikipedia.org/wiki/- 2650346
>> https://en.wikipedia.org/wiki/Bajirao_I 1414810
>> https://en.wikipedia.org/wiki/Dilwale_(2015_film) 1410015
>> https://en.wikipedia.org/wiki/Mastani 1232964
>> https://en.wikipedia.org/wiki/Bajirao_Mastani_(film) 1133261
>> https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2015 632890
>> https://en.wikipedia.org/wiki/Hate_Story_3 582816
>> https://en.wikipedia.org/wiki/Special:MobileMenu 499379
>> https://en.wikipedia.org/wiki/Star_Wars:_The_Force_Awakens 438113
>> https://en.wikipedia.org/wiki/Tamasha_(film) 390519
>> https://en.wikipedia.org/wiki/Prem_Ratan_Dhan_Payo 378133
>> https://en.wikipedia.org/wiki/India 368946
>> https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2016 335547
>> https://en.wikipedia.org/wiki/Star_Wars 334326
>> https://en.wikipedia.org/wiki/Sunny_Leone 333848
>> https://en.wikipedia.org/wiki/Sundar_Pichai 329264
>> https://en.wikipedia.org/wiki/Special:Book 324255
>> https://en.wikipedia.org/wiki/List_of_highest-grossing_Bollywood_films
>> 321418
>> https://en.wikipedia.org/wiki/Salman_Khan 309113
>> https://en.wikipedia.org/wiki/'Tis_the_Season 308221
>> https://en.wikipedia.org/wiki/Mandana_Karimi 289662
>> https://en.wikipedia.org/wiki/Kyaa_Kool_Hain_Hum_3 281801
>> https://en.wikipedia.org/wiki/Kashibai 272673
>> https://en.wikipedia.org/wiki/Bigg_Boss_9 272203
>> https://en.wikipedia.org/wiki/Kriti_Sanon 266773
>> https://en.wikipedia.org/wiki/2012_Delhi_gang_rape 265296
>> https://en.wikipedia.org/wiki/Shah_Rukh_Khan 263729
>> https://en.wikipedia.org/wiki/Neerja_Bhanot 259410
>> https://en.wikipedia.org/wiki/Nora_Fatehi 252085
>> https://en.wikipedia.org/wiki/Ashoka 250255
>> https://en.wikipedia.org/wiki/B._K._S._Iyengar 248422
>> https://en.wikipedia.org/wiki/2015_South_Indian_floods 246377
>> https://en.wikipedia.org/wiki/Baahubali:_The_Beginning 244281
>> https://en.wikipedia.org/wiki/Shamsher_Bahadur_I_(Krishna_Rao) 232122
>> https://en.wikipedia.org/wiki/Christmas 228278
>> https://en.wikipedia.org/wiki/Thanga_Magan_(2015_film) 222373
>> https://en.wikipedia.org/wiki/Ranveer_Singh 221010
>> https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam 220612
>> https://en.wikipedia.org/wiki/Shivaji 218245
>> https://en.wikipedia.org/wiki/Deepika_Padukone 218242
>> https://en.wikipedia.org/wiki/TLC:_Tables,_Ladders_and_Chairs_(2015)
>> 211920
>> https://en.wikipedia.org/wiki/Gizele_Thakral 206585
>> https://en.wikipedia.org/wiki/Urvashi_Rautela 204305
>> https://en.wikipedia.org/wiki/Peshwa 194957
>> https://en.wikipedia.org/wiki/Kajol 192044
>> https://hi.wikipedia.org/wiki/मुखपृष्ठ 184274
>> https://en.wikipedia.org/wiki/Quantico_(TV_series) 183112
>> https://en.wikipedia.org/wiki/Mahatma_Gandhi 182336
>> Time taken: 562.621 seconds, Fetched: 50 row(s)
>>
>>
>> See also the discussion at https://phabricator.wikimedia.org/T120113
>> (As mentioned there, a while ago I retrieved the global top 200 pages
>> for a timespan of almost six months, with some wait time but no major
>> issues. It's not quite clear to me why the "brute force" approach
>> mentioned in the ticket failed, but I guess it had to do with the
>> difficulty of repeating such a query for all projects - or countries -
>> to generate top lists for every one of them.)
>>
>> On Wed, Jan 20, 2016 at 12:42 PM, Kevin Leduc <[email protected]> wrote:
>> > +Analytics list so they can comment.
>> >
>> > I don't have such a script.  It's a pretty intensive job to compile top
>> > articles especially over a month.  The pageview API was supposed to have
>> > top
>> > articles per month per wiki but the job is so massive that it failed to
>> > run
>> > in Hive.  Analytics knows there are better algorithms out there to solve
>> > this problem.  So the pageview API just has top per day per wiki.
>> >
>> > I imagine that you are looking at some very specific wikis and
>> > countries...
>> > not all of them.  Maybe someone on the list can make an example hive
>> > script
>> > (given a wiki and country) that gives the top for a day.
>> >
>> >
>> > On Wed, Jan 20, 2016 at 12:23 PM, Dan Foy <[email protected]> wrote:
>> >>
>> >> Hi Kevin,
>> >>
>> >> In your collection of scripts for Hive, do you have one that can act as
>> >> a
>> >> starting point for me to get the top N articles / URLs for Wikipedia in
>> >> a
>> >> country?
>> >>
>> >> Thanks,
>> >> Dan
>> >>
>> >>
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Tilman Bayer
>> Senior Analyst
>> Wikimedia Foundation
>> IRC (Freenode): HaeB
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to