On 2 Mar 2015, at 00:35, Nuria Ruiz <[email protected]> wrote:

> Thanks Timo for taking the time to write this. 
> 
> 

You're welcome. Thanks for this research. I'm excited about the results.


> 
> >There are also non-MediaWiki environments (ab)using bits.wikimedia.org and 
> >bypassing the startup module. As such these are loading javascript modules 
> >directly, regardless of browser. There are at least two of these that I know 
> >of:
> I think our raw hive data probably does not includes the traffic from tools 
> or wikipedia.org (need to confirm). But even if it did, the traffic of  tools 
> on bits is not significant compared to the one from wikipedia thus does not 
> affect the overall results as we are throwing away the longtail. Note that 
> couple days worth of traffic might be more than a 1 billion requests for 
> javascript on bits. 


Unless bits.wikimedia.org traffic statistics filters out things via the Referer 
header, I don't see how it could not include traffic triggered by Tool Labs and 
www-portals like www.wikipedia.org. They make script requests to 
bits.wikimedia.org.

But yeah, Tool Labs traffic will be tiny in comparison. I honestly have no clue 
how popular our www-portals are. I'd be interested in seeing some stats on that.


> 
> >Actually, there are probably about a dozen more exceptions I can think of. I 
> >don't believe it is feasibly possible to filter everything out. 
> Statistically I do not think you need to, given the volume of traffic in 
> wikipedia versus the other sources, you just cannot report results with a 
> precision of, say, 0.001%. Even very small wikis - whose traffic is 
> insignificant compared to english wikipedia- are also being thrown away. 

Point taken. Thanks :)

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to