Steve, Woolrych and Cockton argue that the discrepancy is Nielsen's constant of .31. Neilson assumes all issues have the same visibility. We have not even added the extra dimension of evaluator effect :-)
Do you have a reference for the more resent paper? I would be interested in reading it. On the manufacturing side most of the metrics use a margin of error. With just 10 users your margin of error will be about +/-35% (very rough calculation). That is far better than no test, but still would be considered extremely low in a manufacturing process. In Anthropology most of papers I have read use far greater sample sizes than just a population of 10. Yes it depends on the subject mater. The Anthropologist will use techniques like using informers, which increases the number of participants. And the Anthropologist is studying the population over months if not years, so there are far more observations. @thomas testing the wireframe will only show up what is already visible. But if a feature has an issue, and it is implemented in the wireframe, then a test will show it up. Discovering an issue early is surely better than later. I think your statement iterates the idea that testing frequently is a good idea. All the best James blog.feralabs.com 2009/10/2 Steve Baty <[email protected]> > James, > > Excellent points. > > Nielsen argues that 5 users will discover 84% of the issues; not that the > likelihood of finding a particular issue is 84% - thus the discrepancy in > our figures (41% & 65% respectively). > > (And I can't believe I'm defending Nielsen's figures, but this is one of > his better studies) The results from '93 were re-evaluated more recently for > Web-based systems with similar results. There's also some good theory on > this from sociology and cultural anthropology - but I think we're moving far > afield from the original question. > > Regarding the manufacturing reference - which I introduced, granted - units > tend to be tested in batches for the reason you mention. The presence of > defects in a batch signals a problem and further testing is carried out. > > I also like the approach Amazon (and others) take in response to your last > point, which is to release new features to small (for them) numbers of users > - 1,000, then 5,000 etc - so that these low-incidence problems can surface. > When the potential impact is high, this is a really solid approach to take. > > Regards > > Steve > > 2009/10/2 James Page <[email protected]> > >> Steve, >> >> The real issue is that the example I have given is that it is over >> simplistic. It is dependent on sterile lab conditions, and the user >> population been the same in the lab and in the real world. And there only >> being one issue that effects 10% of the user population. One of the great >> beauties of the world is the complexity and diversity of people. In the >> sterile lab people are tested on the same machine (we have found machine >> configuration such as screen size has a bearing on behaviour), and they >> don't have the distractions that normally effect the user in the real >> world. >> >> Actually, that's not true. You'd be fairly likely to discover it with only >>> 5-10 users - in the 65%+ range of 'likely'. >>> >>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is >> far off from Nielson number that 5 users will find 84% of the issues. >> (1-(1-0.31)^5) >> >> If I was manufacturing and there was a 45% chance that 10% of my cars >> leave the production line with a fault, there is a high chance that >> consumers would stop buying my product, the company would go bust, and I >> would be out a job. From my experience of production lines a sample size of >> 10 for a production of one million units would be considered extremely low. >> >> We have moved allong way since 1993 when Nielsen and Landauer's paper was >> published. The web was not arround, and the profile of users was very >> different. The web has changed that. We will need to test with more people >> as websites traffic increases, and we get better at web site design. For >> example if we assume that designers of a web site have been using good >> design principles and therefore an issue only effects 2.5% of users. Then 10 >> users in a test will only discover that issue 22% of the time. But using our >> 1 million visitors a year example the issue will mean that 25,000 people >> will experience problems. >> >> But we do agree that each population needs it's own test. And I totally >> agree that testing iteratively is a good idea. >> >> @William -- Woolrych and Cockton 2001 argument applies to simple task >> based tests. See >> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf> >> >> All the best >> >> James >> blog.feralabs.com >> >> PS (*Disclaimer*) Due to my belief that usability testing needs not just >> to be more statistically sound, but also be able to test a wide range of >> users from different cultures I co-founded www.webnographer.com a remote >> usability testing tool. So I am advocate for testing with more >> geographically diverse users than normal lab tests. >> >> 2009/10/2 Steve Baty <[email protected]> >> >> "If your client website has 1 million visitors a year, a usability issue >>> that >>> effects 10% of the users would be unlikely to be discovered on a test of >>> only 5 to 10 users, but would give 100,000 people a bad experience when >>> they >>> visit the site." >>> >>> Actually, that's not true. You'd be fairly likely to discover it with >>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality >>> control systems and product quality testing have been using such statistical >>> methods since the 20's and they went through heavy refinement and >>> sophistication in the 60's, 70's and 80's. >>> >>> It's also worth repeating the message both Jakob & Jared Spool are >>> constantly talking about: test iteratively with a group of 5-10 >>> participants. You'll find that 65%+ figure above rises to 99%+ in that case. >>> >>> Again, doesn't change your basic points about cultural diversity and >>> behaviour affecting the test parameters, but your above point is not >>> entirely accurate. >>> >>> Cheers >>> Steve >>> >>> 2009/10/2 James Page <[email protected]> >>> >>> It is dependent on how many issues there are, the cultural variance of >>>> your >>>> user base, and the margin of error you are happy with. Five users or >>>> even 10 >>>> is not enough on a modern well designed web site. >>>> >>>> The easy way to think of a Usability Test is a treasure hunt. If the >>>> treasure is very obvious then you will need fewer people, if less >>>> obvious >>>> then you will need more people. If you increase the area of the hunt >>>> then >>>> you will need more people. Most of the advocates of only testing 5 to 10 >>>> users, experience comes from one country. Behaviour changes >>>> significantly >>>> country by country, even in Western Europe. See my blog post here : >>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/ >>>> >>>> If your client website has 1 million visitors a year, a usability issue >>>> that >>>> effects 10% of the users would be unlikely to be discovered on a test of >>>> only 5 to 10 users, but would give 100,000 people a bad experience when >>>> they >>>> visit the site. >>>> >>>> Can you find treasure with only five or ten users. Of course you can. >>>> But >>>> how sure can you be that you have found even significant issues. >>>> >>>> A very good argument in why 10 is not enough is Woolrych and Cockton >>>> 2001. >>>> They point out an issue in Nielsen formula in that he does not take into >>>> account the visibility of an issue. They show using only 5 users can >>>> significantly under count even significant usability issues. >>>> >>>> The following powerpoint from an eyetracking study demonstrates the >>>> issue >>>> with only using a few users. >>>> http://docs.realeyes.it/why50.ppt >>>> >>>> You may also want to look at the margin of error for the test that you >>>> are >>>> doing. >>>> >>>> All the best >>>> >>>> James >>>> blog.feralabs.com >>>> >>>> 2009/10/1 Will Hacker <[email protected]> >>>> >>>> > Chris, >>>> > >>>> > There is not any statistical formula or method that will tell you the >>>> > correct number of people to test. In my experience it depends on the >>>> > functions you are testing, how many test scenarios you want to run >>>> > and how many of those can be done by one participant in one session, >>>> > and how many different levels of expertise you need (e.g. novice, >>>> > intermediate, and/or expert) to really exercise your application. >>>> > >>>> > I have gotten valuable insight from testing 6-10 people for ecommerce >>>> > sites with fairly common functionality that people are generally >>>> > familiar with but have used more for more complex applications where >>>> > there are different levels of features that some users rely on >>>> > heavily and others never use. >>>> > >>>> > I do believe that any testing is better than none, and realize you >>>> > are likely limited by time and budget. I think you can usually get >>>> > fairly effective results with 10 or fewer people. >>>> > >>>> > Will >>>> > >>>> > >>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >>>> > Posted from the new ixda.org >>>> > http://www.ixda.org/discuss?post=46278 >>>> > >>>> > >>>> > ________________________________________________________________ >>>> > Welcome to the Interaction Design Association (IxDA)! >>>> > To post to this list ....... [email protected] >>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe >>>> > List Guidelines ............ http://www.ixda.org/guidelines >>>> > List Help .................. http://www.ixda.org/help >>>> > >>>> ________________________________________________________________ >>>> Welcome to the Interaction Design Association (IxDA)! >>>> To post to this list ....... [email protected] >>>> Unsubscribe ................ http://www.ixda.org/unsubscribe >>>> List Guidelines ............ http://www.ixda.org/guidelines >>>> List Help .................. http://www.ixda.org/help >>>> >>> >>> >>> >>> -- >>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: >>> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: >>> www.linkedin.com/in/stevebaty >>> >> >> > > > -- > Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: > [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: > www.linkedin.com/in/stevebaty > ________________________________________________________________ Welcome to the Interaction Design Association (IxDA)! To post to this list ....... [email protected] Unsubscribe ................ http://www.ixda.org/unsubscribe List Guidelines ............ http://www.ixda.org/guidelines List Help .................. http://www.ixda.org/help
