On 2 Mar 2015, at 00:35, Nuria Ruiz <[email protected]> wrote: > Thanks Timo for taking the time to write this. > >
You're welcome. Thanks for this research. I'm excited about the results. > > >There are also non-MediaWiki environments (ab)using bits.wikimedia.org and > >bypassing the startup module. As such these are loading javascript modules > >directly, regardless of browser. There are at least two of these that I know > >of: > I think our raw hive data probably does not includes the traffic from tools > or wikipedia.org (need to confirm). But even if it did, the traffic of tools > on bits is not significant compared to the one from wikipedia thus does not > affect the overall results as we are throwing away the longtail. Note that > couple days worth of traffic might be more than a 1 billion requests for > javascript on bits. Unless bits.wikimedia.org traffic statistics filters out things via the Referer header, I don't see how it could not include traffic triggered by Tool Labs and www-portals like www.wikipedia.org. They make script requests to bits.wikimedia.org. But yeah, Tool Labs traffic will be tiny in comparison. I honestly have no clue how popular our www-portals are. I'd be interested in seeing some stats on that. > > >Actually, there are probably about a dozen more exceptions I can think of. I > >don't believe it is feasibly possible to filter everything out. > Statistically I do not think you need to, given the volume of traffic in > wikipedia versus the other sources, you just cannot report results with a > precision of, say, 0.001%. Even very small wikis - whose traffic is > insignificant compared to english wikipedia- are also being thrown away. Point taken. Thanks :)
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
