>Note that couple days worth of traffic might be more than a 1 billion
requests for javascript on bits.
Sorry, correction. Couple days worth of "javascript bits" requests comes up
to 100 million requests not a 1000 million.

On Sun, Mar 1, 2015 at 4:35 PM, Nuria Ruiz <[email protected]> wrote:

> Thanks Timo for taking the time to write this.
>
> >The following requests are not part of our primary javascript payload
> and should be excluded when >interpreting bits.wikimedia.org requests for
> purposes of javascript "support":
> Correct. I think I excluded all those.
> Note that I listed on methodology "bits javascript traffic" not  overall
> "bits traffic"
> https://www.mediawiki.org/wiki/Analytics/Reports/ClientsWithoutJavascript#Metodology
>
> I will double check the startup module just to be safe.
>
>
>
> >There are also non-MediaWiki environments (ab)using bits.wikimedia.org and
> bypassing the startup module. As such these are loading javascript modules
> directly, regardless of browser. There are at least two of these that I
> know of:
> I think our raw hive data probably does not includes the traffic from
> tools or wikipedia.org (need to confirm). But even if it did, the traffic
> of  tools on bits is not significant compared to the one from wikipedia
> thus does not affect the overall results as we are throwing away the
> longtail. Note that couple days worth of traffic might be more than a 1
> billion requests for javascript on bits.
>
>
>
> >Actually, there are probably about a dozen more exceptions I can think
> of. I don't believe it is feasibly possible to filter everything out.
> Statistically I do not think you need to, given the volume of traffic in
> wikipedia versus the other sources, you just cannot report results with a
> precision of, say, 0.001%. Even very small wikis - whose traffic is
> insignificant compared to english wikipedia- are also being thrown away.
> That is to say that if in the vasque wikipedia everyone started using
> "browser X" w/o Javascript support it will not be counted as it represents
> too small of a percentage of overall traffic. Results provided are an
> agreggation over all wikipedia's bits raw javascript traffic versus
> wikipedias overall pageviews. Because we are throwing away the long tail,
>  results come from the most trafficked wikis (our disparity in pageviews
> among wikis is huge). If you want to get per wiki results you need to
> analyze the data in a completely different fashion.
>
>
>
>
>
>
>
> On Sat, Feb 28, 2015 at 4:48 PM, Timo Tijhof <[email protected]>
> wrote:
>
>> Hi,
>>
>> Here's a few thoughts about what may influence the data you're gathering.
>>
>> The decision of whether a browser has sufficient support for our Grade A
>> runtime happens client-side based on a combination of feature tests and
>> (unfortunately) user-agent sniffing.
>>
>> For this reason, our bootstrap script is written using only the most
>> basic syntax and prototype methods (as any other methods would cause a
>> run-time exception). For those familiar, this is somewhat similar to PHP
>> version detection in MediaWiki. The file has to parse and run to a certain
>> point in very old environments.
>>
>> The following requests are not part of our primary javascript payload and
>> should be excluded when interpreting bits.wikimedia.org requests for
>> purposes of javascript "support":
>>
>> * stylesheets (e.g. ".css" requests as well as load.php?...&only=styles
>> requests)
>> * images (e.g. ".png", ".svg" etc. as well as load.php?...&image=..
>> requests)
>> * favicons and apple-touch icons (e.g. bits.wikimedia.org/favicon/..,
>> bits.wikimedia.org/apple-touch/..)
>> * fonts (e.g. bits.wikimedia.org/static-../../fonts/..)
>> * events (e.g. bits.wikimedia.org/event.gif, bits.wikimedia.org/statsv)
>> * startup module (bits.wikimedia.org/../load.php?..modules=startup)
>>
>> There are also non-MediaWiki environments (ab)using bits.wikimedia.org
>> and bypassing the startup module. As such these are loading javascript
>> modules directly, regardless of browser. There are at least two of these
>> that I know of:
>>
>> 1) Tool labs tools. Developers there may use bits.wikimedia.org to serve
>> modules like jQuery UI. They may circumvent the startup module and
>> unconditionally load those (which will cause errors in older browsers, but
>> they don't care or are unaware of how this works).
>>
>> 2) Portals such as www.wikipedia.org and others.
>>
>> For the data to be as reliable as feasibly possible, one would want to
>> filter out these "forged" requests not produced by MediaWiki. The best way
>> to filter out requests that bypassed the startup module is to filter out
>> requests with no version= query parameter. As well as request with an
>> outdated version parameter (since they can copy an old url and hardcode it
>> in their app).
>>
>> Actually, there are probably about a dozen more exceptions I can think
>> of. I don't believe it is feasibly possible to filter everything out.
>> Perhaps focus your next data-gathering window on a specific payload url -
>> instead of trying to catch all javascript payloads with exclusions for
>> wrong ones.
>>
>> For example, right now in MediaWiki 1.25wmf18 the jquery/mediawiki base
>> payload has version 20150225T221331Z and is requested by the startup module
>> from url (grabbed from the Network tab in Chrome Dev Tools):
>>
>>
>> https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=20150225T221331Z
>>
>> Using only a specific url like that to gather user agents that support
>> javascript will have considerably less false positives.
>>
>> If you want to incorporate multiple wikis, it'll be a little more work to
>> get all the right urls, but handpicking a dozen wikis will probably be good
>> enough.
>>
>> This also has the advantage of not being biased by devices cache size.
>> Because, unlike all other modules, the base module is not cached in the
>> LocalStorage. It will still benefit HTTP 304 caching however. It would help
>> to have your window start simultaneously with the deployment of a new wmf
>> branch to en.wikipedia.org (and other wikis you include in the
>> experiment) so there's a fresh start with caching.
>>
>> </braindump>
>>
>> — Timo
>>
>> On 18 Feb 2015, at 18:07, Nuria Ruiz <[email protected]> wrote:
>>
>> > Do you think it's worth getting the UA distribution for CSS requests &
>> correlate it with the distribution for page / JS loading?
>> Yes, we can do that. I would need to gather a new dataset for it so I've
>> made a new task for it (https://phabricator.wikimedia.org/T89847),
>> marking this one as complete: https://phabricator.wikimedia.org/T88560
>>
>>
>> I also like to do some research regarding IE6 /IE7 as we should see those
>> (according to our code:
>> https://github.com/wikimedia/mediawiki/blob/master/resources/src/startup.js)
>> in the no JS list but we only see some UA agents there. There are
>> definitely IE6/IE7 browsers to which we are serving javascript, just have
>> to look in detail what is what we are serving there. Will report on this.
>> Looks like this startup.js file is being served to all browsers regardless,
>> so I might need to do some more fine grained queries.
>>
>> Just consider the 3% as your approximate upper bound for overall traffic,
>> big bots removed. If you just count mobile traffic, numbers in percentage
>> are, of course, a lot higher.
>>
>> Thanks,
>>
>> Nuria
>>
>>
>>
>> On 17 Feb 2015, at 03:38, Nuria Ruiz <[email protected]> wrote:
>>
>> Gabriel:
>>
>> I have run through the data and have a rough estimate of how many of our
>> pageviews are requested from browsers w/o strong javascript support. It is
>> a preliminary rough estimate but I think is pretty useful.
>>
>> TL;DR
>> According to our new pageview definition (
>> https://meta.wikimedia.org/wiki/Research:Page_view) about 10% of
>> pageviews come from clients w/o much javascript support. But - BIG CAVEAT-
>> this includes bots requests. If you remove the easy-too-spot-big-bots the
>> percentage is <3%.
>>
>> Details here (still some homework to do regarding IE6 and IE7)
>> https://www.mediawiki.org/wiki/Analytics/Reports/ClientsWithoutJavascript
>>
>>
>> Thanks,
>>
>> Nuria
>>
>>
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to