Dan: > I would love it if people on this list or elsewhere would start identifying > the highest value reports from wikistats. We can also use traffic data to > figure out the most popular pages, but this doesn't always mean highest value.
The traffic data Dan refers to (I assume) is this: http://stats.wikimedia.org/wikistats-traffic-2015-04.html Indeed pageviews for each report can be misleading (see e.g. red links to totally outdated reports) So how to go about this? I made a list of squid based traffic reports (some more to add). Will this work? Concept pages: https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future <https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future#Future:_general_ideas> https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2 Jane: > I think we should keep them until we have new ones, because if you axe them > now, no one will remember how or why they were built (and you won't be able > to point users in the right direction). Sure, I'm not going to delete the existing reports. I'm merely suggesting not to update some of those, and put a clear warning on top, that they are no longer accurate enough to base any conclusions on it. Gergo: > Is there a specific reason for disabling country, mime type etc. reports? You're right, some of the traffic reports under discussion are less maintenance sensitive, mime type and target wiki are good examples. I might as well keep those for now. There is a major issue with the breakdown by geography reports, and I may have to invalidate versions for 2015. For example share of Russian traffic dropped from 5% to 1% in recent reports. This may have to do with https traffic being misattributed to country where WMF data center resides. I will follow-up. Erik From: [email protected] [mailto:[email protected]] On Behalf Of Gergo Tisza Sent: Saturday, July 25, 2015 21:02 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] proposal to axe current traffic reports On Fri, Jul 24, 2015 at 1:25 PM, Erik Zachte <[email protected]> wrote: Wikistats broadly comes in two parts - A Content and activity reports per wiki (html tables and charts based on the xml dumps) - B Traffic reports Traffic reports are built from two sources -- B1 Domas' hourly aggregations per wiki, aggregated further into monthly totals per wiki (mobile/non-mobile, normalized/non-normalized), grouped by project e.g. http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm -- B2 Sampled log lines (these days generated via hadoop) These sampled log lines are used for two types of reports (with some hybrids) --- B2a Breakdowns of traffic by geographic criteria (country, continent, N/S) http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVisitsEdits.htm --- B2b Breakdowns of traffic by non geographic criteria (os, browser, mime type, target wiki, referer, etc) http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traffic My current proposal is on disabling B2b and hybrid reports like http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm Is there a specific reason for disabling country, mime type etc. reports? User agent sniffing rules require constant updates as new browsers appear, so browser reports become misleading when unmaintained, but I would expect e.g. the target wiki logic to be fairly stable; and country logic (I assume) is maintained externally by MaxMind; are there also known problems with those?
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
