James, Excellent points.
Nielsen argues that 5 users will discover 84% of the issues; not that the likelihood of finding a particular issue is 84% - thus the discrepancy in our figures (41% & 65% respectively). (And I can't believe I'm defending Nielsen's figures, but this is one of his better studies) The results from '93 were re-evaluated more recently for Web-based systems with similar results. There's also some good theory on this from sociology and cultural anthropology - but I think we're moving far afield from the original question. Regarding the manufacturing reference - which I introduced, granted - units tend to be tested in batches for the reason you mention. The presence of defects in a batch signals a problem and further testing is carried out. I also like the approach Amazon (and others) take in response to your last point, which is to release new features to small (for them) numbers of users - 1,000, then 5,000 etc - so that these low-incidence problems can surface. When the potential impact is high, this is a really solid approach to take. Regards Steve 2009/10/2 James Page <[email protected]> > Steve, > > The real issue is that the example I have given is that it is over > simplistic. It is dependent on sterile lab conditions, and the user > population been the same in the lab and in the real world. And there only > being one issue that effects 10% of the user population. One of the great > beauties of the world is the complexity and diversity of people. In the > sterile lab people are tested on the same machine (we have found machine > configuration such as screen size has a bearing on behaviour), and they > don't have the distractions that normally effect the user in the real > world. > > Actually, that's not true. You'd be fairly likely to discover it with only >> 5-10 users - in the 65%+ range of 'likely'. >> >> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is > far off from Nielson number that 5 users will find 84% of the issues. > (1-(1-0.31)^5) > > If I was manufacturing and there was a 45% chance that 10% of my cars leave > the production line with a fault, there is a high chance that consumers > would stop buying my product, the company would go bust, and I would be out > a job. From my experience of production lines a sample size of 10 for a > production of one million units would be considered extremely low. > > We have moved allong way since 1993 when Nielsen and Landauer's paper was > published. The web was not arround, and the profile of users was very > different. The web has changed that. We will need to test with more people > as websites traffic increases, and we get better at web site design. For > example if we assume that designers of a web site have been using good > design principles and therefore an issue only effects 2.5% of users. Then 10 > users in a test will only discover that issue 22% of the time. But using our > 1 million visitors a year example the issue will mean that 25,000 people > will experience problems. > > But we do agree that each population needs it's own test. And I totally > agree that testing iteratively is a good idea. > > @William -- Woolrych and Cockton 2001 argument applies to simple task > based tests. See > http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf> > > All the best > > James > blog.feralabs.com > > PS (*Disclaimer*) Due to my belief that usability testing needs not just > to be more statistically sound, but also be able to test a wide range of > users from different cultures I co-founded www.webnographer.com a remote > usability testing tool. So I am advocate for testing with more > geographically diverse users than normal lab tests. > > 2009/10/2 Steve Baty <[email protected]> > > "If your client website has 1 million visitors a year, a usability issue >> that >> effects 10% of the users would be unlikely to be discovered on a test of >> only 5 to 10 users, but would give 100,000 people a bad experience when >> they >> visit the site." >> >> Actually, that's not true. You'd be fairly likely to discover it with only >> 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality control >> systems and product quality testing have been using such statistical methods >> since the 20's and they went through heavy refinement and sophistication in >> the 60's, 70's and 80's. >> >> It's also worth repeating the message both Jakob & Jared Spool are >> constantly talking about: test iteratively with a group of 5-10 >> participants. You'll find that 65%+ figure above rises to 99%+ in that case. >> >> Again, doesn't change your basic points about cultural diversity and >> behaviour affecting the test parameters, but your above point is not >> entirely accurate. >> >> Cheers >> Steve >> >> 2009/10/2 James Page <[email protected]> >> >> It is dependent on how many issues there are, the cultural variance of >>> your >>> user base, and the margin of error you are happy with. Five users or even >>> 10 >>> is not enough on a modern well designed web site. >>> >>> The easy way to think of a Usability Test is a treasure hunt. If the >>> treasure is very obvious then you will need fewer people, if less obvious >>> then you will need more people. If you increase the area of the hunt then >>> you will need more people. Most of the advocates of only testing 5 to 10 >>> users, experience comes from one country. Behaviour changes significantly >>> country by country, even in Western Europe. See my blog post here : >>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/ >>> >>> If your client website has 1 million visitors a year, a usability issue >>> that >>> effects 10% of the users would be unlikely to be discovered on a test of >>> only 5 to 10 users, but would give 100,000 people a bad experience when >>> they >>> visit the site. >>> >>> Can you find treasure with only five or ten users. Of course you can. But >>> how sure can you be that you have found even significant issues. >>> >>> A very good argument in why 10 is not enough is Woolrych and Cockton >>> 2001. >>> They point out an issue in Nielsen formula in that he does not take into >>> account the visibility of an issue. They show using only 5 users can >>> significantly under count even significant usability issues. >>> >>> The following powerpoint from an eyetracking study demonstrates the issue >>> with only using a few users. >>> http://docs.realeyes.it/why50.ppt >>> >>> You may also want to look at the margin of error for the test that you >>> are >>> doing. >>> >>> All the best >>> >>> James >>> blog.feralabs.com >>> >>> 2009/10/1 Will Hacker <[email protected]> >>> >>> > Chris, >>> > >>> > There is not any statistical formula or method that will tell you the >>> > correct number of people to test. In my experience it depends on the >>> > functions you are testing, how many test scenarios you want to run >>> > and how many of those can be done by one participant in one session, >>> > and how many different levels of expertise you need (e.g. novice, >>> > intermediate, and/or expert) to really exercise your application. >>> > >>> > I have gotten valuable insight from testing 6-10 people for ecommerce >>> > sites with fairly common functionality that people are generally >>> > familiar with but have used more for more complex applications where >>> > there are different levels of features that some users rely on >>> > heavily and others never use. >>> > >>> > I do believe that any testing is better than none, and realize you >>> > are likely limited by time and budget. I think you can usually get >>> > fairly effective results with 10 or fewer people. >>> > >>> > Will >>> > >>> > >>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >>> > Posted from the new ixda.org >>> > http://www.ixda.org/discuss?post=46278 >>> > >>> > >>> > ________________________________________________________________ >>> > Welcome to the Interaction Design Association (IxDA)! >>> > To post to this list ....... [email protected] >>> > Unsubscribe ................ http://www.ixda.org/unsubscribe >>> > List Guidelines ............ http://www.ixda.org/guidelines >>> > List Help .................. http://www.ixda.org/help >>> > >>> ________________________________________________________________ >>> Welcome to the Interaction Design Association (IxDA)! >>> To post to this list ....... [email protected] >>> Unsubscribe ................ http://www.ixda.org/unsubscribe >>> List Guidelines ............ http://www.ixda.org/guidelines >>> List Help .................. http://www.ixda.org/help >>> >> >> >> >> -- >> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: >> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: >> www.linkedin.com/in/stevebaty >> > > -- Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: www.linkedin.com/in/stevebaty ________________________________________________________________ Welcome to the Interaction Design Association (IxDA)! To post to this list ....... [email protected] Unsubscribe ................ http://www.ixda.org/unsubscribe List Guidelines ............ http://www.ixda.org/guidelines List Help .................. http://www.ixda.org/help
