>Note that couple days worth of traffic might be more than a 1 billion requests for javascript on bits. Sorry, correction. Couple days worth of "javascript bits" requests comes up to 100 million requests not a 1000 million.
On Sun, Mar 1, 2015 at 4:35 PM, Nuria Ruiz <[email protected]> wrote: > Thanks Timo for taking the time to write this. > > >The following requests are not part of our primary javascript payload > and should be excluded when >interpreting bits.wikimedia.org requests for > purposes of javascript "support": > Correct. I think I excluded all those. > Note that I listed on methodology "bits javascript traffic" not overall > "bits traffic" > https://www.mediawiki.org/wiki/Analytics/Reports/ClientsWithoutJavascript#Metodology > > I will double check the startup module just to be safe. > > > > >There are also non-MediaWiki environments (ab)using bits.wikimedia.org and > bypassing the startup module. As such these are loading javascript modules > directly, regardless of browser. There are at least two of these that I > know of: > I think our raw hive data probably does not includes the traffic from > tools or wikipedia.org (need to confirm). But even if it did, the traffic > of tools on bits is not significant compared to the one from wikipedia > thus does not affect the overall results as we are throwing away the > longtail. Note that couple days worth of traffic might be more than a 1 > billion requests for javascript on bits. > > > > >Actually, there are probably about a dozen more exceptions I can think > of. I don't believe it is feasibly possible to filter everything out. > Statistically I do not think you need to, given the volume of traffic in > wikipedia versus the other sources, you just cannot report results with a > precision of, say, 0.001%. Even very small wikis - whose traffic is > insignificant compared to english wikipedia- are also being thrown away. > That is to say that if in the vasque wikipedia everyone started using > "browser X" w/o Javascript support it will not be counted as it represents > too small of a percentage of overall traffic. Results provided are an > agreggation over all wikipedia's bits raw javascript traffic versus > wikipedias overall pageviews. Because we are throwing away the long tail, > results come from the most trafficked wikis (our disparity in pageviews > among wikis is huge). If you want to get per wiki results you need to > analyze the data in a completely different fashion. > > > > > > > > On Sat, Feb 28, 2015 at 4:48 PM, Timo Tijhof <[email protected]> > wrote: > >> Hi, >> >> Here's a few thoughts about what may influence the data you're gathering. >> >> The decision of whether a browser has sufficient support for our Grade A >> runtime happens client-side based on a combination of feature tests and >> (unfortunately) user-agent sniffing. >> >> For this reason, our bootstrap script is written using only the most >> basic syntax and prototype methods (as any other methods would cause a >> run-time exception). For those familiar, this is somewhat similar to PHP >> version detection in MediaWiki. The file has to parse and run to a certain >> point in very old environments. >> >> The following requests are not part of our primary javascript payload and >> should be excluded when interpreting bits.wikimedia.org requests for >> purposes of javascript "support": >> >> * stylesheets (e.g. ".css" requests as well as load.php?...&only=styles >> requests) >> * images (e.g. ".png", ".svg" etc. as well as load.php?...&image=.. >> requests) >> * favicons and apple-touch icons (e.g. bits.wikimedia.org/favicon/.., >> bits.wikimedia.org/apple-touch/..) >> * fonts (e.g. bits.wikimedia.org/static-../../fonts/..) >> * events (e.g. bits.wikimedia.org/event.gif, bits.wikimedia.org/statsv) >> * startup module (bits.wikimedia.org/../load.php?..modules=startup) >> >> There are also non-MediaWiki environments (ab)using bits.wikimedia.org >> and bypassing the startup module. As such these are loading javascript >> modules directly, regardless of browser. There are at least two of these >> that I know of: >> >> 1) Tool labs tools. Developers there may use bits.wikimedia.org to serve >> modules like jQuery UI. They may circumvent the startup module and >> unconditionally load those (which will cause errors in older browsers, but >> they don't care or are unaware of how this works). >> >> 2) Portals such as www.wikipedia.org and others. >> >> For the data to be as reliable as feasibly possible, one would want to >> filter out these "forged" requests not produced by MediaWiki. The best way >> to filter out requests that bypassed the startup module is to filter out >> requests with no version= query parameter. As well as request with an >> outdated version parameter (since they can copy an old url and hardcode it >> in their app). >> >> Actually, there are probably about a dozen more exceptions I can think >> of. I don't believe it is feasibly possible to filter everything out. >> Perhaps focus your next data-gathering window on a specific payload url - >> instead of trying to catch all javascript payloads with exclusions for >> wrong ones. >> >> For example, right now in MediaWiki 1.25wmf18 the jquery/mediawiki base >> payload has version 20150225T221331Z and is requested by the startup module >> from url (grabbed from the Network tab in Chrome Dev Tools): >> >> >> https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=20150225T221331Z >> >> Using only a specific url like that to gather user agents that support >> javascript will have considerably less false positives. >> >> If you want to incorporate multiple wikis, it'll be a little more work to >> get all the right urls, but handpicking a dozen wikis will probably be good >> enough. >> >> This also has the advantage of not being biased by devices cache size. >> Because, unlike all other modules, the base module is not cached in the >> LocalStorage. It will still benefit HTTP 304 caching however. It would help >> to have your window start simultaneously with the deployment of a new wmf >> branch to en.wikipedia.org (and other wikis you include in the >> experiment) so there's a fresh start with caching. >> >> </braindump> >> >> — Timo >> >> On 18 Feb 2015, at 18:07, Nuria Ruiz <[email protected]> wrote: >> >> > Do you think it's worth getting the UA distribution for CSS requests & >> correlate it with the distribution for page / JS loading? >> Yes, we can do that. I would need to gather a new dataset for it so I've >> made a new task for it (https://phabricator.wikimedia.org/T89847), >> marking this one as complete: https://phabricator.wikimedia.org/T88560 >> >> >> I also like to do some research regarding IE6 /IE7 as we should see those >> (according to our code: >> https://github.com/wikimedia/mediawiki/blob/master/resources/src/startup.js) >> in the no JS list but we only see some UA agents there. There are >> definitely IE6/IE7 browsers to which we are serving javascript, just have >> to look in detail what is what we are serving there. Will report on this. >> Looks like this startup.js file is being served to all browsers regardless, >> so I might need to do some more fine grained queries. >> >> Just consider the 3% as your approximate upper bound for overall traffic, >> big bots removed. If you just count mobile traffic, numbers in percentage >> are, of course, a lot higher. >> >> Thanks, >> >> Nuria >> >> >> >> On 17 Feb 2015, at 03:38, Nuria Ruiz <[email protected]> wrote: >> >> Gabriel: >> >> I have run through the data and have a rough estimate of how many of our >> pageviews are requested from browsers w/o strong javascript support. It is >> a preliminary rough estimate but I think is pretty useful. >> >> TL;DR >> According to our new pageview definition ( >> https://meta.wikimedia.org/wiki/Research:Page_view) about 10% of >> pageviews come from clients w/o much javascript support. But - BIG CAVEAT- >> this includes bots requests. If you remove the easy-too-spot-big-bots the >> percentage is <3%. >> >> Details here (still some homework to do regarding IE6 and IE7) >> https://www.mediawiki.org/wiki/Analytics/Reports/ClientsWithoutJavascript >> >> >> Thanks, >> >> Nuria >> >> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
