Hi,

Here's a few thoughts about what may influence the data you're gathering.

The decision of whether a browser has sufficient support for our Grade A 
runtime happens client-side based on a combination of feature tests and 
(unfortunately) user-agent sniffing.

For this reason, our bootstrap script is written using only the most basic 
syntax and prototype methods (as any other methods would cause a run-time 
exception). For those familiar, this is somewhat similar to PHP version 
detection in MediaWiki. The file has to parse and run to a certain point in 
very old environments.

The following requests are not part of our primary javascript payload and 
should be excluded when interpreting bits.wikimedia.org requests for purposes 
of javascript "support":

* stylesheets (e.g. ".css" requests as well as load.php?...&only=styles 
requests)
* images (e.g. ".png", ".svg" etc. as well as load.php?...&image=.. requests)
* favicons and apple-touch icons (e.g. bits.wikimedia.org/favicon/.., 
bits.wikimedia.org/apple-touch/..)
* fonts (e.g. bits.wikimedia.org/static-../../fonts/..)
* events (e.g. bits.wikimedia.org/event.gif, bits.wikimedia.org/statsv)
* startup module (bits.wikimedia.org/../load.php?..modules=startup)

There are also non-MediaWiki environments (ab)using bits.wikimedia.org and 
bypassing the startup module. As such these are loading javascript modules 
directly, regardless of browser. There are at least two of these that I know of:

1) Tool labs tools. Developers there may use bits.wikimedia.org to serve 
modules like jQuery UI. They may circumvent the startup module and 
unconditionally load those (which will cause errors in older browsers, but they 
don't care or are unaware of how this works).

2) Portals such as www.wikipedia.org and others.

For the data to be as reliable as feasibly possible, one would want to filter 
out these "forged" requests not produced by MediaWiki. The best way to filter 
out requests that bypassed the startup module is to filter out requests with no 
version= query parameter. As well as request with an outdated version parameter 
(since they can copy an old url and hardcode it in their app).

Actually, there are probably about a dozen more exceptions I can think of. I 
don't believe it is feasibly possible to filter everything out. Perhaps focus 
your next data-gathering window on a specific payload url - instead of trying 
to catch all javascript payloads with exclusions for wrong ones.

For example, right now in MediaWiki 1.25wmf18 the jquery/mediawiki base payload 
has version 20150225T221331Z and is requested by the startup module from url 
(grabbed from the Network tab in Chrome Dev Tools):

https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=20150225T221331Z

Using only a specific url like that to gather user agents that support 
javascript will have considerably less false positives.

If you want to incorporate multiple wikis, it'll be a little more work to get 
all the right urls, but handpicking a dozen wikis will probably be good enough.

This also has the advantage of not being biased by devices cache size. Because, 
unlike all other modules, the base module is not cached in the LocalStorage. It 
will still benefit HTTP 304 caching however. It would help to have your window 
start simultaneously with the deployment of a new wmf branch to 
en.wikipedia.org (and other wikis you include in the experiment) so there's a 
fresh start with caching.

</braindump>

— Timo

On 18 Feb 2015, at 18:07, Nuria Ruiz <[email protected]> wrote:

> > Do you think it's worth getting the UA distribution for CSS requests & 
> > correlate it with the distribution for page / JS loading?
> Yes, we can do that. I would need to gather a new dataset for it so I've made 
> a new task for it (https://phabricator.wikimedia.org/T89847), marking this 
> one as complete: https://phabricator.wikimedia.org/T88560
> 
> 
> I also like to do some research regarding IE6 /IE7 as we should see those 
> (according to our code: 
> https://github.com/wikimedia/mediawiki/blob/master/resources/src/startup.js) 
> in the no JS list but we only see some UA agents there. There are definitely 
> IE6/IE7 browsers to which we are serving javascript, just have to look in 
> detail what is what we are serving there. Will report on this. Looks like 
> this startup.js file is being served to all browsers regardless, so I might 
> need to do some more fine grained queries.
> 
> Just consider the 3% as your approximate upper bound for overall traffic, big 
> bots removed. If you just count mobile traffic, numbers in percentage are, of 
> course, a lot higher.
> 
> Thanks, 
> 
> Nuria


On 17 Feb 2015, at 03:38, Nuria Ruiz <[email protected]> wrote:

> Gabriel:
> 
> I have run through the data and have a rough estimate of how many of our 
> pageviews are requested from browsers w/o strong javascript support. It is a 
> preliminary rough estimate but I think is pretty useful.
> 
> TL;DR
> According to our new pageview definition 
> (https://meta.wikimedia.org/wiki/Research:Page_view) about 10% of pageviews 
> come from clients w/o much javascript support. But - BIG CAVEAT- this 
> includes bots requests. If you remove the easy-too-spot-big-bots the 
> percentage is <3%. 
> 
> Details here (still some homework to do regarding IE6 and IE7) 
> https://www.mediawiki.org/wiki/Analytics/Reports/ClientsWithoutJavascript
> 
> 
> Thanks, 
> 
> Nuria

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to