Dan:

> I would love it if people on this list or elsewhere would start identifying 
> the highest value reports from wikistats.  We can also use traffic data to 
> figure out the most popular pages, but this doesn't always mean highest value.

 

The traffic data Dan refers to (I assume) is this:

http://stats.wikimedia.org/wikistats-traffic-2015-04.html

Indeed pageviews for each report can be misleading (see e.g. red links to 
totally outdated reports) 

 

So how to go about this? I made a list of squid based traffic reports (some 
more to add). Will this work?

 

Concept pages:

https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future 
<https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future#Future:_general_ideas>
 

https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2

 

 

Jane:

> I think we should keep them until we have new ones, because if you axe them 
> now, no one will remember how or why they were built (and you won't be able 
> to point users in the right direction). 

 

Sure, I'm not going to delete the existing reports. I'm merely suggesting not 
to update some of those, and put a clear warning on top, that they are no 
longer accurate enough to base any conclusions on it. 

 

Gergo:

> Is there a specific reason for disabling country, mime type etc. reports? 

 

You're right, some of the traffic reports under discussion are less maintenance 
sensitive, mime type and target wiki are good examples. I might as well keep 
those for now. 

 

There is a major issue with the breakdown by geography reports, and I may have 
to invalidate versions for 2015. For example share of Russian traffic dropped 
from 5% to 1% in recent reports. 

This may have to do with https traffic being misattributed to country where WMF 
data center resides. I will follow-up.

 

 

Erik

 

From: [email protected] 
[mailto:[email protected]] On Behalf Of Gergo Tisza
Sent: Saturday, July 25, 2015 21:02
To: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics.
Subject: Re: [Analytics] proposal to axe current traffic reports

 

On Fri, Jul 24, 2015 at 1:25 PM, Erik Zachte <[email protected]> wrote:

Wikistats broadly comes in two parts
- A Content and activity reports per wiki (html tables and charts based on the 
xml dumps)
- B Traffic reports

  Traffic reports are built from two sources

  -- B1 Domas' hourly aggregations per wiki, aggregated further into monthly 
totals per wiki (mobile/non-mobile,  normalized/non-normalized), grouped by 
project
     e.g. http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm

  -- B2 Sampled log lines (these days generated via hadoop)

      These sampled log lines are used for two types of reports (with some 
hybrids)

      --- B2a Breakdowns of traffic by geographic criteria (country, continent, 
N/S)
            
http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVisitsEdits.htm

      --- B2b Breakdowns of traffic by non geographic criteria (os, browser, 
mime type, target wiki, referer, etc)
            
http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traffic

My current proposal is on disabling B2b and hybrid reports like
http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm

 

Is there a specific reason for disabling country, mime type etc. reports? User 
agent sniffing rules require constant updates as new browsers appear, so 
browser reports become misleading when unmaintained, but I would expect e.g. 
the target wiki logic to be fairly stable; and country logic (I assume) is 
maintained externally by MaxMind; are there also known problems with those?

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to