If you read Martin Belam (hello Martin!) on the methods he used to derive these 
figures, you'll note that he's extremely thorough in his data analysis. 
http://www.currybet.net/articles/user_agents/index.php I think you should read 
a little levity in to Jem's use of a grin after the Linux comment!

Below are the stats, taken from our Sage Analyst system 
(http://www.sagemetrics.com/content/sageanalyst/overview.html - about the 
system, currently very slow!), from the 24th of march - the most recent 24h 
period available. We tend to run a bit late, as, IIRC, the daily server logs 
run to around 5gigabytes of data, which needs to be warehoused and processed.

These figures are for all visits, to all pages of the whole of bbc.co.uk, not 
just the homepage.

Automated requests (from bots, spiders etc) are stripped from our data; as far 
as I know we comply with JICWEBS and IFABC  standards that require this. This 
is done using browser string filtering, against an industry standard set of 
strings supplied by IFABC.

I provide these OS breakdowns both as % of Total Page Views, and % of users. 
Unique users are deduplicated, based on Cookie data - so you should caveat that 
with the usual cookie churn stuff*. However, as we're looking at percentage 
shares in a very large (6.5million+) user sample, I think it should be 
considered a good indicative slice. 


By Page Impression
Operating Systems for Mar 24, 2007 for Entire Site from Entire World    
OS Type % of Total Page Views  
Windows 88.37
Macintosh       4.51
Liberate        3.32
Nokia   1.09
SonyEricsson    0.67
BlackBerry      0.43
Motorola        0.36
Samsung 0.23
LG      0.17
NEC     0.08
Orange  0.04
Sagem   0.03
O2      0.02
TMobile 0.01
Sharp   0.01
Linux   0.01
DOS     0
Panasonic       0
BenQ    0
Sprint  0
ZTE     0
Philips 0
Unix    0
VK      0
Siemens 0
Toshiba 0
Sun     0
Sanyo   0
IRIX    0
OSF1    0
Unidentified    0.65

By User
Operating Systems for Mar 24, 2007 for Entire Site from Entire World    
OS Type     % of Total Users     
Windows 85.39
Macintosh       6.51
Nokia   2.26
Liberate        1.66
SonyEricsson    1.5
Motorola        0.84
BlackBerry      0.76
Samsung 0.55
LG      0.18
Sagem   0.08
Orange  0.06
Sharp   0.04
O2      0.03
TMobile 0.03
Linux   0.02
Panasonic       0.02
NEC     0.02
BenQ    0.01
DOS     0.01
Philips 0.01
ZTE     0
Sprint  0
Toshiba 0
VK      0
Unix    0
Siemens 0
Sanyo   0
Sun     0
IRIX    0
OSF1    0

- - - 

Breakdown of WINDOWS operating systems  
Operating Systems for Mar 24, 2007 for Entire Site from Entire World    
                OS Type                           % of Total Page Views  
Windows XP      53.71
Windows XP SP2  31.96
Windows 2000    6.94
Windows NT      2.65
Windows Vista   2.25
Windows 98      1.23
Windows ME      0.72
Windows CE      0.35
Windows 32      0.13
Windows 95      0.06
Windows 64      0.01
Windows 31      0

Breakdown of MAC os'es  
Operating Systems  for  Mar 24, 2007 for Entire Site from Entire World          
OS Type % of Total Page Views  
Macintosh X     97.21
Macintosh PowerPC       2.53
Macintosh       0.26
Macintosh OS8   0
        
Breakdown of LINUX oses 
Operating Systems for Mar 24, 2007 for Entire Site from Entire World    
OS Type % of Total Page Views  
Linux 24        43.17
Linux 22        36.4
Linux 20        20.43

*From our guidance notes, internally: 
Figures for unique users are based on the BBCUID.
This is a unique identifier - known as a cookie - which is sent to a user's 
computer the first time they request a page from a BBC web site. Provided the 
cookie is accepted by the requesting computer then it will be saved to that 
computer's memory and will be returned to the web server with all subsequent 
requests.
The returned cookies are included in the log records for each request and 
because each cookie is unique it is then possible to track the activity of each 
user across time.
The total number of unique users is really a count of the number of unique 
BBCUID values seen in the logs.
Note that although each cookie may appear many times in the log it must only be 
counted once. It is this "de-duplication" that makes unique user figures 
difficult to calculate.

Some important points to note about unique users:

    * Users are not "people". Cookies attach to browsers, to user logins or 
possibly to a combination of these. If 2 people share the same machine and the 
same user login they would share the same BBCUID and appear as the same person. 
Equally if the same person were to use two different machines then they would 
be counted as two users.
    * Some browsers do not accept cookies. When this happens a new cookie will 
be sent out for every request that browser makes. If we counted these cookies 
as users it would push the number of users up. So we don't count cookies we 
send out, only those that we get back.
    * There may be a number of situations where cookies, including the BBCUID, 
will get deleted from a computer. Some companies wipe cookies from machines at 
regular intervals. In some environments, e.g. internet café's or schools, 
computers will destroy cookies when a person logs off from a session. Many 
browsers offer options to easily delete cookies. In any case where the BBCUID 
cookie is deleted then the next time a request is made from that machine or 
user a new cookie will be issued and will appear as a new user.
    * Unique user figures should never be added (or subtracted) in case the 
same BBCUIDs are included in the numbers in the calculation. E.g. you could not 
add the users of Eastenders to the users of Radio 1 because the total would 
double count any users that had used both sites.        

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Andy
> Sent: 27 March 2007 17:19
> To: backstage@lists.bbc.co.uk
> Subject: Re: [backstage] Browser Stats
> 
> On 26/03/07, Jeremy Stone <[EMAIL PROTECTED]> wrote:
> > 0.4% of users at the time used a Linux operating system  ;)
> 
> That's not entirely true is it?
> Please do not try to mislead people.
> 
> What is more likely is:
> 0.4% of users WHERE DETECTED AS using a Linux operating 
> system AT THE TIME THEY VISITED THE BBC SITE.
> 
> This number can be wrong for a multitude of reasons.
> 
> 1) the BBC stats are biased, the site is target at Windows 
> users and on certain pages blocks users of other OSes 
> (bbc.co.uk uses ActiveX for instance)
> 
> 2) Detection software may not have been as tuned to recognize 
> a Linux OS, after all many distros don't call them selves 
> 'Linux', it may not be in the user agent string. (simply 
> looking for the word Linux is not good enough).
> 
> 3) A Linux user may have been misreporting the Operating 
> System (commonly used to cater for sites that do user agent 
> sniffing badly, also used to blend in with the crowd for anonymity).
> 
> 4) Someone may have a dual boot (or triple or more), and may 
> only be using Windows to view bbc.co.ku, possibly due to 
> being locked out by previously mentioned technological 
> practices of the BBC.
> 
> 5) Some 'users' may not be real people, they may be robots 
> spoofing there user agent. 90% of email is spam. How have you 
> accounted for web robots browsing your site looking for email 
> addresses or trying to post spam comments (they would not hit 
> robots.txt or say robot in the user agent, that would give 
> them away)? I am thinking most spam bots would impersonate IE 
> on Windows as it probably has the highest market share so 
> much harder o filter. (by how high we are unsure).
> 
> Additionally you could argue you would get the less 
> knowledgable users in this sampling, I rarely hit the BBC 
> home page, why bother? I know where I want to go and I get 
> the news feeds in a handy RSS so I probably don't hit 
> news.bbc.co.uk's homepage either.
> I have the pages I need on bookmarks, (Favourites for you IE users).
> 
> This is the great thing about statistics people like you 
> claim they show something and try to cover up the failings of 
> how the sampling was done.
> 
> It shows only as much as it records. The number of recognized 
> User Agent strings for hits on the BBC website.
> 
> (Quick question, is this per IP or per page hit? page hit 
> would be bad as it would allow robots to skew the results 
> badly as they would hit far more pages).
> 
> I really do dislike statistics, especially when people try to 
> claim that they prove something without accounting for the 
> method of gathering.
> 
> And now a quote:
> > There are three kinds of commonly recognised untruths:
> >
> >      Lies, damn lies and statistics.
> >      - Mark Twain
> >
> > This quote from Mark Twain is accurate; statistics are 
> often used to 
> > lie to the public because most people do not understand how 
> statistics work.
> 
> And this quote is from where you ask? Why it is from the BBC 
> of course! (well I had to use the BBC quote didn't I? 
> especially it is the first result on Google for: lies damn 
> lies statistics)
> 
> Maybe you should improve your stats?
> 1.Group each unique header together and have a Skilled Human 
> with knowledge of all operating system classify them according to OS.
> 2. Make each visitor pass a Turing Test prior to using there 
> User Agent.
> 3. Verify details of OS using other methods, i.e. Javascript 
> could check, or use OS fingerprinting (hopefully it wouldn't 
> hit NAT routers, otherwise you'd probably get the OS of a 
> router,. which although interesting is not what we are 
> looking for is it?).
> 
> On the subject of whether to support IE 5, is it supported by 
> Microsoft or has it been end of lifed? If it's been end of 
> lifed then maybe you don't need to support it.
> 
> Why do you need to 'support' specific browsers anyway? This 
> is what standards are ofr, I don't need to check the 
> compatibility with every piece of software on every switch 
> between here and my destination node, they are using a 
> standard I just make sure I follow that standard. Why should 
> the HTML content be any different?
> 
> The underlying TCP/IP and HTTP system seem to work much more 
> compatibly than all these websites, many of which display 
> poorly if you stray so slightly of the most common browser 
> and settings, does this not show that standards work better?
> 
> Andy
> 
> --
> First they ignore you
> then they laugh at you
> then they fight you
> then you win.
> - Mohandas Gandhi
> -
> Sent via the backstage.bbc.co.uk discussion group.  To 
> unsubscribe, please visit 
> http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
>   Unofficial list archive: 
> http://www.mail-archive.com/backstage@lists.bbc.co.uk/
> 

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/

Reply via email to