Steve, The real issue is that the example I have given is that it is over simplistic. It is dependent on sterile lab conditions, and the user population been the same in the lab and in the real world. And there only being one issue that effects 10% of the user population. One of the great beauties of the world is the complexity and diversity of people. In the sterile lab people are tested on the same machine (we have found machine configuration such as screen size has a bearing on behaviour), and they don't have the distractions that normally effect the user in the real world.
Actually, that's not true. You'd be fairly likely to discover it with only > 5-10 users - in the 65%+ range of 'likely'. > > For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is far off from Nielson number that 5 users will find 84% of the issues. (1-(1-0.31)^5) If I was manufacturing and there was a 45% chance that 10% of my cars leave the production line with a fault, there is a high chance that consumers would stop buying my product, the company would go bust, and I would be out a job. From my experience of production lines a sample size of 10 for a production of one million units would be considered extremely low. We have moved allong way since 1993 when Nielsen and Landauer's paper was published. The web was not arround, and the profile of users was very different. The web has changed that. We will need to test with more people as websites traffic increases, and we get better at web site design. For example if we assume that designers of a web site have been using good design principles and therefore an issue only effects 2.5% of users. Then 10 users in a test will only discover that issue 22% of the time. But using our 1 million visitors a year example the issue will mean that 25,000 people will experience problems. But we do agree that each population needs it's own test. And I totally agree that testing iteratively is a good idea. @William -- Woolrych and Cockton 2001 argument applies to simple task based tests. See http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf> All the best James blog.feralabs.com PS (*Disclaimer*) Due to my belief that usability testing needs not just to be more statistically sound, but also be able to test a wide range of users from different cultures I co-founded www.webnographer.com a remote usability testing tool. So I am advocate for testing with more geographically diverse users than normal lab tests. 2009/10/2 Steve Baty <[email protected]> > "If your client website has 1 million visitors a year, a usability issue > that > effects 10% of the users would be unlikely to be discovered on a test of > only 5 to 10 users, but would give 100,000 people a bad experience when > they > visit the site." > > Actually, that's not true. You'd be fairly likely to discover it with only > 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality control > systems and product quality testing have been using such statistical methods > since the 20's and they went through heavy refinement and sophistication in > the 60's, 70's and 80's. > > It's also worth repeating the message both Jakob & Jared Spool are > constantly talking about: test iteratively with a group of 5-10 > participants. You'll find that 65%+ figure above rises to 99%+ in that case. > > Again, doesn't change your basic points about cultural diversity and > behaviour affecting the test parameters, but your above point is not > entirely accurate. > > Cheers > Steve > > 2009/10/2 James Page <[email protected]> > > It is dependent on how many issues there are, the cultural variance of your >> user base, and the margin of error you are happy with. Five users or even >> 10 >> is not enough on a modern well designed web site. >> >> The easy way to think of a Usability Test is a treasure hunt. If the >> treasure is very obvious then you will need fewer people, if less obvious >> then you will need more people. If you increase the area of the hunt then >> you will need more people. Most of the advocates of only testing 5 to 10 >> users, experience comes from one country. Behaviour changes significantly >> country by country, even in Western Europe. See my blog post here : >> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/ >> >> If your client website has 1 million visitors a year, a usability issue >> that >> effects 10% of the users would be unlikely to be discovered on a test of >> only 5 to 10 users, but would give 100,000 people a bad experience when >> they >> visit the site. >> >> Can you find treasure with only five or ten users. Of course you can. But >> how sure can you be that you have found even significant issues. >> >> A very good argument in why 10 is not enough is Woolrych and Cockton 2001. >> They point out an issue in Nielsen formula in that he does not take into >> account the visibility of an issue. They show using only 5 users can >> significantly under count even significant usability issues. >> >> The following powerpoint from an eyetracking study demonstrates the issue >> with only using a few users. >> http://docs.realeyes.it/why50.ppt >> >> You may also want to look at the margin of error for the test that you are >> doing. >> >> All the best >> >> James >> blog.feralabs.com >> >> 2009/10/1 Will Hacker <[email protected]> >> >> > Chris, >> > >> > There is not any statistical formula or method that will tell you the >> > correct number of people to test. In my experience it depends on the >> > functions you are testing, how many test scenarios you want to run >> > and how many of those can be done by one participant in one session, >> > and how many different levels of expertise you need (e.g. novice, >> > intermediate, and/or expert) to really exercise your application. >> > >> > I have gotten valuable insight from testing 6-10 people for ecommerce >> > sites with fairly common functionality that people are generally >> > familiar with but have used more for more complex applications where >> > there are different levels of features that some users rely on >> > heavily and others never use. >> > >> > I do believe that any testing is better than none, and realize you >> > are likely limited by time and budget. I think you can usually get >> > fairly effective results with 10 or fewer people. >> > >> > Will >> > >> > >> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >> > Posted from the new ixda.org >> > http://www.ixda.org/discuss?post=46278 >> > >> > >> > ________________________________________________________________ >> > Welcome to the Interaction Design Association (IxDA)! >> > To post to this list ....... [email protected] >> > Unsubscribe ................ http://www.ixda.org/unsubscribe >> > List Guidelines ............ http://www.ixda.org/guidelines >> > List Help .................. http://www.ixda.org/help >> > >> ________________________________________________________________ >> Welcome to the Interaction Design Association (IxDA)! >> To post to this list ....... [email protected] >> Unsubscribe ................ http://www.ixda.org/unsubscribe >> List Guidelines ............ http://www.ixda.org/guidelines >> List Help .................. http://www.ixda.org/help >> > > > > -- > Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: > [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: > www.linkedin.com/in/stevebaty > ________________________________________________________________ Welcome to the Interaction Design Association (IxDA)! To post to this list ....... [email protected] Unsubscribe ................ http://www.ixda.org/unsubscribe List Guidelines ............ http://www.ixda.org/guidelines List Help .................. http://www.ixda.org/help
