Totally agree with your article.... > So you can get a much narrower range for your estimate, but 30+ users is a > significant undertaking for a usability test. > One of our own findings from a study was that people got bored with testing more than about 8 users.
James 2009/10/2 Steve Baty <[email protected]> > James, > > More good points. I did some calculations a while back on the confidence > intervals for pass/fail user tests - > http://www.meld.com.au/2006/05/when-100-isnt-really-100-updated - the more > interesting part being the link to a paper on estimators of expected values. > Worth a read if you haven't seen it. > > I'll try to dig up the more recent paper - working from memory on that one. > > Regarding the anthropology & sociology references - I was referring more to > the notion of uncovering societal norms rather than the specific 'supporting > a sample size of x'. > > Coming back to your first point: yeah, the use of the .31 is a > simplification for the sake of one of his free articles; it's a modal figure > based on (his words) a large number of projects. So, looking at a range of > figures, you would have some projects where more users were needed (to your > earlier point), and in some cases - few - you could get away with less > (although I admit that the use of less than 5 participants causes me some > concern). > > Anyway, enjoying the discussion, and I still think we're violently in > agreement on the basic point :) > > > Cheers > Steve > > 2009/10/2 James Page <[email protected]> > >> Steve, >> >> Woolrych and Cockton argue that the discrepancy is Nielsen's constant of >> .31. Neilson assumes all issues have the same visibility. We have not even >> added the extra dimension of evaluator effect :-) >> >> Do you have a reference for the more resent paper? I would be interested >> in reading it. >> >> On the manufacturing side most of the metrics use a margin of error. With >> just 10 users your margin of error will be about +/-35% (very rough >> calculation). That is far better than no test, but still would be considered >> extremely low in a manufacturing process. >> >> In Anthropology most of papers I have read use far greater sample sizes >> than just a population of 10. Yes it depends on the subject mater. The >> Anthropologist will use techniques like using informers, which increases the >> number of participants. And the Anthropologist is studying the population >> over months if not years, so there are far more observations. >> >> @thomas testing the wireframe will only show up what is already visible. >> But if a feature has an issue, and it is implemented in the wireframe, then >> a test will show it up. Discovering an issue early is surely better than >> later. I think your statement iterates the idea that testing frequently is a >> good idea. >> >> All the best >> >> James >> blog.feralabs.com >> >> >> 2009/10/2 Steve Baty <[email protected]> >> >>> James, >>> >>> Excellent points. >>> >>> Nielsen argues that 5 users will discover 84% of the issues; not that the >>> likelihood of finding a particular issue is 84% - thus the discrepancy in >>> our figures (41% & 65% respectively). >>> >>> (And I can't believe I'm defending Nielsen's figures, but this is one of >>> his better studies) The results from '93 were re-evaluated more recently for >>> Web-based systems with similar results. There's also some good theory on >>> this from sociology and cultural anthropology - but I think we're moving far >>> afield from the original question. >>> >>> Regarding the manufacturing reference - which I introduced, granted - >>> units tend to be tested in batches for the reason you mention. The presence >>> of defects in a batch signals a problem and further testing is carried out. >>> >>> I also like the approach Amazon (and others) take in response to your >>> last point, which is to release new features to small (for them) numbers of >>> users - 1,000, then 5,000 etc - so that these low-incidence problems can >>> surface. When the potential impact is high, this is a really solid approach >>> to take. >>> >>> Regards >>> >>> Steve >>> >>> 2009/10/2 James Page <[email protected]> >>> >>>> Steve, >>>> >>>> The real issue is that the example I have given is that it is over >>>> simplistic. It is dependent on sterile lab conditions, and the user >>>> population been the same in the lab and in the real world. And there only >>>> being one issue that effects 10% of the user population. One of the great >>>> beauties of the world is the complexity and diversity of people. In the >>>> sterile lab people are tested on the same machine (we have found machine >>>> configuration such as screen size has a bearing on behaviour), and they >>>> don't have the distractions that normally effect the user in the real >>>> world. >>>> >>>> Actually, that's not true. You'd be fairly likely to discover it with >>>>> only 5-10 users - in the 65%+ range of 'likely'. >>>>> >>>>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This >>>> is far off from Nielson number that 5 users will find 84% of the issues. >>>> (1-(1-0.31)^5) >>>> >>>> If I was manufacturing and there was a 45% chance that 10% of my cars >>>> leave the production line with a fault, there is a high chance that >>>> consumers would stop buying my product, the company would go bust, and I >>>> would be out a job. From my experience of production lines a sample size of >>>> 10 for a production of one million units would be considered extremely low. >>>> >>>> We have moved allong way since 1993 when Nielsen and Landauer's paper >>>> was published. The web was not arround, and the profile of users was very >>>> different. The web has changed that. We will need to test with more people >>>> as websites traffic increases, and we get better at web site design. For >>>> example if we assume that designers of a web site have been using good >>>> design principles and therefore an issue only effects 2.5% of users. Then >>>> 10 >>>> users in a test will only discover that issue 22% of the time. But using >>>> our >>>> 1 million visitors a year example the issue will mean that 25,000 people >>>> will experience problems. >>>> >>>> But we do agree that each population needs it's own test. And I totally >>>> agree that testing iteratively is a good idea. >>>> >>>> @William -- Woolrych and Cockton 2001 argument applies to simple task >>>> based tests. See >>>> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf> >>>> >>>> All the best >>>> >>>> James >>>> blog.feralabs.com >>>> >>>> PS (*Disclaimer*) Due to my belief that usability testing needs not >>>> just to be more statistically sound, but also be able to test a wide range >>>> of users from different cultures I co-founded www.webnographer.com a >>>> remote usability testing tool. So I am advocate for testing with more >>>> geographically diverse users than normal lab tests. >>>> >>>> 2009/10/2 Steve Baty <[email protected]> >>>> >>>> "If your client website has 1 million visitors a year, a usability issue >>>>> that >>>>> effects 10% of the users would be unlikely to be discovered on a test >>>>> of >>>>> only 5 to 10 users, but would give 100,000 people a bad experience when >>>>> they >>>>> visit the site." >>>>> >>>>> Actually, that's not true. You'd be fairly likely to discover it with >>>>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality >>>>> control systems and product quality testing have been using such >>>>> statistical >>>>> methods since the 20's and they went through heavy refinement and >>>>> sophistication in the 60's, 70's and 80's. >>>>> >>>>> It's also worth repeating the message both Jakob & Jared Spool are >>>>> constantly talking about: test iteratively with a group of 5-10 >>>>> participants. You'll find that 65%+ figure above rises to 99%+ in that >>>>> case. >>>>> >>>>> Again, doesn't change your basic points about cultural diversity and >>>>> behaviour affecting the test parameters, but your above point is not >>>>> entirely accurate. >>>>> >>>>> Cheers >>>>> Steve >>>>> >>>>> 2009/10/2 James Page <[email protected]> >>>>> >>>>> It is dependent on how many issues there are, the cultural variance of >>>>>> your >>>>>> user base, and the margin of error you are happy with. Five users or >>>>>> even 10 >>>>>> is not enough on a modern well designed web site. >>>>>> >>>>>> The easy way to think of a Usability Test is a treasure hunt. If the >>>>>> treasure is very obvious then you will need fewer people, if less >>>>>> obvious >>>>>> then you will need more people. If you increase the area of the hunt >>>>>> then >>>>>> you will need more people. Most of the advocates of only testing 5 to >>>>>> 10 >>>>>> users, experience comes from one country. Behaviour changes >>>>>> significantly >>>>>> country by country, even in Western Europe. See my blog post here : >>>>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/ >>>>>> >>>>>> If your client website has 1 million visitors a year, a usability >>>>>> issue that >>>>>> effects 10% of the users would be unlikely to be discovered on a test >>>>>> of >>>>>> only 5 to 10 users, but would give 100,000 people a bad experience >>>>>> when they >>>>>> visit the site. >>>>>> >>>>>> Can you find treasure with only five or ten users. Of course you can. >>>>>> But >>>>>> how sure can you be that you have found even significant issues. >>>>>> >>>>>> A very good argument in why 10 is not enough is Woolrych and Cockton >>>>>> 2001. >>>>>> They point out an issue in Nielsen formula in that he does not take >>>>>> into >>>>>> account the visibility of an issue. They show using only 5 users can >>>>>> significantly under count even significant usability issues. >>>>>> >>>>>> The following powerpoint from an eyetracking study demonstrates the >>>>>> issue >>>>>> with only using a few users. >>>>>> http://docs.realeyes.it/why50.ppt >>>>>> >>>>>> You may also want to look at the margin of error for the test that you >>>>>> are >>>>>> doing. >>>>>> >>>>>> All the best >>>>>> >>>>>> James >>>>>> blog.feralabs.com >>>>>> >>>>>> 2009/10/1 Will Hacker <[email protected]> >>>>>> >>>>>> > Chris, >>>>>> > >>>>>> > There is not any statistical formula or method that will tell you >>>>>> the >>>>>> > correct number of people to test. In my experience it depends on the >>>>>> > functions you are testing, how many test scenarios you want to run >>>>>> > and how many of those can be done by one participant in one session, >>>>>> > and how many different levels of expertise you need (e.g. novice, >>>>>> > intermediate, and/or expert) to really exercise your application. >>>>>> > >>>>>> > I have gotten valuable insight from testing 6-10 people for >>>>>> ecommerce >>>>>> > sites with fairly common functionality that people are generally >>>>>> > familiar with but have used more for more complex applications where >>>>>> > there are different levels of features that some users rely on >>>>>> > heavily and others never use. >>>>>> > >>>>>> > I do believe that any testing is better than none, and realize you >>>>>> > are likely limited by time and budget. I think you can usually get >>>>>> > fairly effective results with 10 or fewer people. >>>>>> > >>>>>> > Will >>>>>> > >>>>>> > >>>>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >>>>>> > Posted from the new ixda.org >>>>>> > http://www.ixda.org/discuss?post=46278 >>>>>> > >>>>>> > >>>>>> > ________________________________________________________________ >>>>>> > Welcome to the Interaction Design Association (IxDA)! >>>>>> > To post to this list ....... [email protected] >>>>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe >>>>>> > List Guidelines ............ http://www.ixda.org/guidelines >>>>>> > List Help .................. http://www.ixda.org/help >>>>>> > >>>>>> ________________________________________________________________ >>>>>> Welcome to the Interaction Design Association (IxDA)! >>>>>> To post to this list ....... [email protected] >>>>>> Unsubscribe ................ http://www.ixda.org/unsubscribe >>>>>> List Guidelines ............ http://www.ixda.org/guidelines >>>>>> List Help .................. http://www.ixda.org/help >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | >>>>> E: [email protected] | Twitter: docbaty | Skype: steve_baty | >>>>> LinkedIn: www.linkedin.com/in/stevebaty >>>>> >>>> >>>> >>> >>> >>> -- >>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: >>> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: >>> www.linkedin.com/in/stevebaty >>> >> >> > > > -- > Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: > [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: > www.linkedin.com/in/stevebaty > ________________________________________________________________ Welcome to the Interaction Design Association (IxDA)! To post to this list ....... [email protected] Unsubscribe ................ http://www.ixda.org/unsubscribe List Guidelines ............ http://www.ixda.org/guidelines List Help .................. http://www.ixda.org/help
