On 26/03/07, Jeremy Stone <[EMAIL PROTECTED]> wrote:
0.4% of users at the time used a Linux operating system ;)
That's not entirely true is it? Please do not try to mislead people. What is more likely is: 0.4% of users WHERE DETECTED AS using a Linux operating system AT THE TIME THEY VISITED THE BBC SITE. This number can be wrong for a multitude of reasons. 1) the BBC stats are biased, the site is target at Windows users and on certain pages blocks users of other OSes (bbc.co.uk uses ActiveX for instance) 2) Detection software may not have been as tuned to recognize a Linux OS, after all many distros don't call them selves 'Linux', it may not be in the user agent string. (simply looking for the word Linux is not good enough). 3) A Linux user may have been misreporting the Operating System (commonly used to cater for sites that do user agent sniffing badly, also used to blend in with the crowd for anonymity). 4) Someone may have a dual boot (or triple or more), and may only be using Windows to view bbc.co.ku, possibly due to being locked out by previously mentioned technological practices of the BBC. 5) Some 'users' may not be real people, they may be robots spoofing there user agent. 90% of email is spam. How have you accounted for web robots browsing your site looking for email addresses or trying to post spam comments (they would not hit robots.txt or say robot in the user agent, that would give them away)? I am thinking most spam bots would impersonate IE on Windows as it probably has the highest market share so much harder o filter. (by how high we are unsure). Additionally you could argue you would get the less knowledgable users in this sampling, I rarely hit the BBC home page, why bother? I know where I want to go and I get the news feeds in a handy RSS so I probably don't hit news.bbc.co.uk's homepage either. I have the pages I need on bookmarks, (Favourites for you IE users). This is the great thing about statistics people like you claim they show something and try to cover up the failings of how the sampling was done. It shows only as much as it records. The number of recognized User Agent strings for hits on the BBC website. (Quick question, is this per IP or per page hit? page hit would be bad as it would allow robots to skew the results badly as they would hit far more pages). I really do dislike statistics, especially when people try to claim that they prove something without accounting for the method of gathering. And now a quote:
There are three kinds of commonly recognised untruths: Lies, damn lies and statistics. - Mark Twain This quote from Mark Twain is accurate; statistics are often used to lie to the public because most people do not understand how statistics work.