James, More good points. I did some calculations a while back on the confidence intervals for pass/fail user tests - http://www.meld.com.au/2006/05/when-100-isnt-really-100-updated - the more interesting part being the link to a paper on estimators of expected values. Worth a read if you haven't seen it.
I'll try to dig up the more recent paper - working from memory on that one. Regarding the anthropology & sociology references - I was referring more to the notion of uncovering societal norms rather than the specific 'supporting a sample size of x'. Coming back to your first point: yeah, the use of the .31 is a simplification for the sake of one of his free articles; it's a modal figure based on (his words) a large number of projects. So, looking at a range of figures, you would have some projects where more users were needed (to your earlier point), and in some cases - few - you could get away with less (although I admit that the use of less than 5 participants causes me some concern). Anyway, enjoying the discussion, and I still think we're violently in agreement on the basic point :) Cheers Steve 2009/10/2 James Page <[email protected]> > Steve, > > Woolrych and Cockton argue that the discrepancy is Nielsen's constant of > .31. Neilson assumes all issues have the same visibility. We have not even > added the extra dimension of evaluator effect :-) > > Do you have a reference for the more resent paper? I would be interested in > reading it. > > On the manufacturing side most of the metrics use a margin of error. With > just 10 users your margin of error will be about +/-35% (very rough > calculation). That is far better than no test, but still would be considered > extremely low in a manufacturing process. > > In Anthropology most of papers I have read use far greater sample sizes > than just a population of 10. Yes it depends on the subject mater. The > Anthropologist will use techniques like using informers, which increases the > number of participants. And the Anthropologist is studying the population > over months if not years, so there are far more observations. > > @thomas testing the wireframe will only show up what is already visible. > But if a feature has an issue, and it is implemented in the wireframe, then > a test will show it up. Discovering an issue early is surely better than > later. I think your statement iterates the idea that testing frequently is a > good idea. > > All the best > > James > blog.feralabs.com > > > 2009/10/2 Steve Baty <[email protected]> > >> James, >> >> Excellent points. >> >> Nielsen argues that 5 users will discover 84% of the issues; not that the >> likelihood of finding a particular issue is 84% - thus the discrepancy in >> our figures (41% & 65% respectively). >> >> (And I can't believe I'm defending Nielsen's figures, but this is one of >> his better studies) The results from '93 were re-evaluated more recently for >> Web-based systems with similar results. There's also some good theory on >> this from sociology and cultural anthropology - but I think we're moving far >> afield from the original question. >> >> Regarding the manufacturing reference - which I introduced, granted - >> units tend to be tested in batches for the reason you mention. The presence >> of defects in a batch signals a problem and further testing is carried out. >> >> I also like the approach Amazon (and others) take in response to your last >> point, which is to release new features to small (for them) numbers of users >> - 1,000, then 5,000 etc - so that these low-incidence problems can surface. >> When the potential impact is high, this is a really solid approach to take. >> >> Regards >> >> Steve >> >> 2009/10/2 James Page <[email protected]> >> >>> Steve, >>> >>> The real issue is that the example I have given is that it is over >>> simplistic. It is dependent on sterile lab conditions, and the user >>> population been the same in the lab and in the real world. And there only >>> being one issue that effects 10% of the user population. One of the great >>> beauties of the world is the complexity and diversity of people. In the >>> sterile lab people are tested on the same machine (we have found machine >>> configuration such as screen size has a bearing on behaviour), and they >>> don't have the distractions that normally effect the user in the real >>> world. >>> >>> Actually, that's not true. You'd be fairly likely to discover it with >>>> only 5-10 users - in the 65%+ range of 'likely'. >>>> >>>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is >>> far off from Nielson number that 5 users will find 84% of the issues. >>> (1-(1-0.31)^5) >>> >>> If I was manufacturing and there was a 45% chance that 10% of my cars >>> leave the production line with a fault, there is a high chance that >>> consumers would stop buying my product, the company would go bust, and I >>> would be out a job. From my experience of production lines a sample size of >>> 10 for a production of one million units would be considered extremely low. >>> >>> We have moved allong way since 1993 when Nielsen and Landauer's paper was >>> published. The web was not arround, and the profile of users was very >>> different. The web has changed that. We will need to test with more people >>> as websites traffic increases, and we get better at web site design. For >>> example if we assume that designers of a web site have been using good >>> design principles and therefore an issue only effects 2.5% of users. Then 10 >>> users in a test will only discover that issue 22% of the time. But using our >>> 1 million visitors a year example the issue will mean that 25,000 people >>> will experience problems. >>> >>> But we do agree that each population needs it's own test. And I totally >>> agree that testing iteratively is a good idea. >>> >>> @William -- Woolrych and Cockton 2001 argument applies to simple task >>> based tests. See >>> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf> >>> >>> All the best >>> >>> James >>> blog.feralabs.com >>> >>> PS (*Disclaimer*) Due to my belief that usability testing needs not just >>> to be more statistically sound, but also be able to test a wide range of >>> users from different cultures I co-founded www.webnographer.com a remote >>> usability testing tool. So I am advocate for testing with more >>> geographically diverse users than normal lab tests. >>> >>> 2009/10/2 Steve Baty <[email protected]> >>> >>> "If your client website has 1 million visitors a year, a usability issue >>>> that >>>> effects 10% of the users would be unlikely to be discovered on a test of >>>> only 5 to 10 users, but would give 100,000 people a bad experience when >>>> they >>>> visit the site." >>>> >>>> Actually, that's not true. You'd be fairly likely to discover it with >>>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality >>>> control systems and product quality testing have been using such >>>> statistical >>>> methods since the 20's and they went through heavy refinement and >>>> sophistication in the 60's, 70's and 80's. >>>> >>>> It's also worth repeating the message both Jakob & Jared Spool are >>>> constantly talking about: test iteratively with a group of 5-10 >>>> participants. You'll find that 65%+ figure above rises to 99%+ in that >>>> case. >>>> >>>> Again, doesn't change your basic points about cultural diversity and >>>> behaviour affecting the test parameters, but your above point is not >>>> entirely accurate. >>>> >>>> Cheers >>>> Steve >>>> >>>> 2009/10/2 James Page <[email protected]> >>>> >>>> It is dependent on how many issues there are, the cultural variance of >>>>> your >>>>> user base, and the margin of error you are happy with. Five users or >>>>> even 10 >>>>> is not enough on a modern well designed web site. >>>>> >>>>> The easy way to think of a Usability Test is a treasure hunt. If the >>>>> treasure is very obvious then you will need fewer people, if less >>>>> obvious >>>>> then you will need more people. If you increase the area of the hunt >>>>> then >>>>> you will need more people. Most of the advocates of only testing 5 to >>>>> 10 >>>>> users, experience comes from one country. Behaviour changes >>>>> significantly >>>>> country by country, even in Western Europe. See my blog post here : >>>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/ >>>>> >>>>> If your client website has 1 million visitors a year, a usability issue >>>>> that >>>>> effects 10% of the users would be unlikely to be discovered on a test >>>>> of >>>>> only 5 to 10 users, but would give 100,000 people a bad experience when >>>>> they >>>>> visit the site. >>>>> >>>>> Can you find treasure with only five or ten users. Of course you can. >>>>> But >>>>> how sure can you be that you have found even significant issues. >>>>> >>>>> A very good argument in why 10 is not enough is Woolrych and Cockton >>>>> 2001. >>>>> They point out an issue in Nielsen formula in that he does not take >>>>> into >>>>> account the visibility of an issue. They show using only 5 users can >>>>> significantly under count even significant usability issues. >>>>> >>>>> The following powerpoint from an eyetracking study demonstrates the >>>>> issue >>>>> with only using a few users. >>>>> http://docs.realeyes.it/why50.ppt >>>>> >>>>> You may also want to look at the margin of error for the test that you >>>>> are >>>>> doing. >>>>> >>>>> All the best >>>>> >>>>> James >>>>> blog.feralabs.com >>>>> >>>>> 2009/10/1 Will Hacker <[email protected]> >>>>> >>>>> > Chris, >>>>> > >>>>> > There is not any statistical formula or method that will tell you the >>>>> > correct number of people to test. In my experience it depends on the >>>>> > functions you are testing, how many test scenarios you want to run >>>>> > and how many of those can be done by one participant in one session, >>>>> > and how many different levels of expertise you need (e.g. novice, >>>>> > intermediate, and/or expert) to really exercise your application. >>>>> > >>>>> > I have gotten valuable insight from testing 6-10 people for ecommerce >>>>> > sites with fairly common functionality that people are generally >>>>> > familiar with but have used more for more complex applications where >>>>> > there are different levels of features that some users rely on >>>>> > heavily and others never use. >>>>> > >>>>> > I do believe that any testing is better than none, and realize you >>>>> > are likely limited by time and budget. I think you can usually get >>>>> > fairly effective results with 10 or fewer people. >>>>> > >>>>> > Will >>>>> > >>>>> > >>>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >>>>> > Posted from the new ixda.org >>>>> > http://www.ixda.org/discuss?post=46278 >>>>> > >>>>> > >>>>> > ________________________________________________________________ >>>>> > Welcome to the Interaction Design Association (IxDA)! >>>>> > To post to this list ....... [email protected] >>>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe >>>>> > List Guidelines ............ http://www.ixda.org/guidelines >>>>> > List Help .................. http://www.ixda.org/help >>>>> > >>>>> ________________________________________________________________ >>>>> Welcome to the Interaction Design Association (IxDA)! >>>>> To post to this list ....... [email protected] >>>>> Unsubscribe ................ http://www.ixda.org/unsubscribe >>>>> List Guidelines ............ http://www.ixda.org/guidelines >>>>> List Help .................. http://www.ixda.org/help >>>>> >>>> >>>> >>>> >>>> -- >>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: >>>> [email protected] | Twitter: docbaty | Skype: steve_baty | >>>> LinkedIn: www.linkedin.com/in/stevebaty >>>> >>> >>> >> >> >> -- >> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: >> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: >> www.linkedin.com/in/stevebaty >> > > -- Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E: [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn: www.linkedin.com/in/stevebaty ________________________________________________________________ Welcome to the Interaction Design Association (IxDA)! To post to this list ....... [email protected] Unsubscribe ................ http://www.ixda.org/unsubscribe List Guidelines ............ http://www.ixda.org/guidelines List Help .................. http://www.ixda.org/help
