Thanks. Another question: For some countries, the result is "-", for
example Germany:

Germany    -    en.wikipedia    1275634

Any idea why?

(I modified the query a bit and added the "project" column. And yes, the
fact that en.wikipedia is at the top in Germany is also quite odd.)


--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

2018-07-09 15:17 GMT+03:00 Francisco Dans <[email protected]>:

> I think as long as you put in a filter so that the minimum pageviews is
> maybe 1000, you should be fine privacy wise. I can't speak too much to your
> second question.
>
> On Mon, Jul 9, 2018 at 1:59 PM, Amir E. Aharoni <
> [email protected]> wrote:
>
>> Thank you so much! In many countries it's
>>
>> A couple of questions:
>> 1. Are any of the results of this query private? Or can I talk about them
>> to people?
>> 2. Is anything like this already published anywhere? If it isn't, it may
>> be nice to publish such a thing, similarly to Google Zeitgeist.
>>
>>
>> --
>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>> http://aharoni.wordpress.com
>> ‪“We're living in pieces,
>> I want to live in peace.” – T. Moore‬
>>
>> 2018-07-09 13:19 GMT+03:00 Francisco Dans <[email protected]>:
>>
>>> Hi Amir,
>>>
>>> As Tilman has suggested, your best bet is to query the pageview_hourly
>>> table. I was going to be lazy and give you a query to just find out the
>>> most viewed article for a given country, but then I made a few experiments
>>> and this is the query I came up with to generate a list of countries and
>>> their respective most viewed articles and view counts. It takes a few
>>> minutes to run for a single day, so I'm sure someone here could suggest a
>>> better approach.
>>>
>>> WITH articles_countries AS (
>>>>     SELECT country, page_title, sum(view_count) AS views
>>>>     FROM pageview_hourly
>>>>     WHERE year=2018 AND month=3 AND day=15
>>>>     GROUP BY country, page_title
>>>> )
>>>> SELECT s.country as country, s.page_title as page_title, s.views as
>>>> views
>>>> FROM (
>>>>     SELECT max(named_struct('views', views, 'country', country,
>>>> 'page_title', page_title)) as s from articles_countries group by country
>>>> ) t;
>>>
>>>
>>> Cheers / see you in ZA,
>>> Fran
>>>
>>>
>>> On Mon, Jul 9, 2018 at 10:18 AM, Amir E. Aharoni <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there a way to find what are the most popular articles per country?
>>>>
>>>> Finding the most popular articles per language is easy with the
>>>> Pageviews tool, but languages and countries are of course not the same.
>>>>
>>>> One thing I tried is going to Turnilo, webrequest_sampled_128, and
>>>> filtering by country. But here it gets troublesome:
>>>> * Splitting can be done by Uri host, which is *more or less* the
>>>> project, or by Uri path, which is *more or less* the article (but see
>>>> below), and I couldn't find a convenient way to combine them.
>>>> * Mobile (.m.) and desktop hosts are separate. It may actually
>>>> sometimes be useful to see differences (or lack thereof) between desktop
>>>> and mobile, but combining them is often useful, too. This can probably be
>>>> done with regular expressions, but this brings us to the biggest problem:
>>>> * Filtering by Uri path would be useful if it didn't have so many paths
>>>> for images, beacons, etc. Filtering using the regular expression
>>>> "\/wiki\/.+" may be the right thing functionally, but in practice it's very
>>>> slow or doesn't work at all.
>>>> * I don't know what exactly is logged in webrequest_sampled_128, but
>>>> the name hints that it doesn't include everything. A sample may be OK for
>>>> countries with a lot of traffic like U.S. or Spain, but for countries with
>>>> smaller traffic this may start being a problem.
>>>>
>>>> Any better ideas?
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>>> http://aharoni.wordpress.com
>>>> ‪“We're living in pieces,
>>>> I want to live in peace.” – T. Moore‬
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>>
>>> --
>>> *Francisco Dans*
>>> Software Engineer, Analytics Team
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> *Francisco Dans*
> Software Engineer, Analytics Team
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to