Re: [IxDA Discuss] Article on Number of Usability Test Participants

Steve Baty Fri, 02 Oct 2009 04:20:20 -0700

James,

Excellent points.


Nielsen argues that 5 users will discover 84% of the issues; not that the
likelihood of finding a particular issue is 84% - thus the discrepancy in
our figures (41% & 65% respectively).

(And I can't believe I'm defending Nielsen's figures, but this is one of his
better studies) The results from '93 were re-evaluated more recently for
Web-based systems with similar results. There's also some good theory on
this from sociology and cultural anthropology - but I think we're moving far
afield from the original question.

Regarding the manufacturing reference - which I introduced, granted - units
tend to be tested in batches for the reason you mention. The presence of
defects in a batch signals a problem and further testing is carried out.

I also like the approach Amazon (and others) take in response to your last
point, which is to release new features to small (for them) numbers of users
- 1,000, then 5,000 etc - so that these low-incidence problems can surface.
When the potential impact is high, this is a really solid approach to take.

Regards
Steve

2009/10/2 James Page <[email protected]>

> Steve,
>
> The real issue is that the example I have given is that it is over
> simplistic. It is dependent on sterile lab conditions, and the user
> population been the same in the lab and in the real world. And there only
> being one issue that effects 10% of the user population. One of the great
> beauties of the world is the complexity and diversity of people. In the
> sterile lab people are tested on the same machine (we have found machine
> configuration such as screen size has a bearing on behaviour), and they
> don't have the distractions that normally effect the user in the real
> world.
>
> Actually, that's not true. You'd be fairly likely to discover it with only
>> 5-10 users - in the 65%+ range of 'likely'.
>>
>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is
> far off from Nielson number that 5 users will find 84% of the issues.
> (1-(1-0.31)^5)
>
> If I was manufacturing and there was a 45% chance that 10% of my cars leave
> the production line with a fault, there is a high chance that consumers
> would stop buying my product, the company would go bust, and I would be out
> a job. From my experience of production lines a sample size of 10 for a
> production of one million units would be considered extremely low.
>
> We have moved allong way since 1993 when Nielsen and Landauer's paper was
> published. The web was not arround, and the profile of users was very
> different. The web has changed that. We will need to test with more people
> as websites traffic increases, and we get better at web site design. For
> example if we assume that designers of a web site have been using good
> design principles and therefore an issue only effects 2.5% of users. Then 10
> users in a test will only discover that issue 22% of the time. But using our
> 1 million visitors a year example the issue will mean that 25,000 people
> will experience problems.
>
> But we do agree that each population needs it's own test. And I totally
> agree that testing iteratively is a good idea.
>
> @William --  Woolrych and Cockton 2001 argument applies to simple task
> based tests. See
> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>
>
> All the best
>
> James
> blog.feralabs.com
>
> PS (*Disclaimer*) Due to my belief that usability testing needs not just
> to be more statistically sound, but also be able to test a wide range of
> users from different cultures I co-founded www.webnographer.com a remote
> usability testing tool. So I am advocate for testing with more
> geographically diverse users than normal lab tests.
>
> 2009/10/2 Steve Baty <[email protected]>
>
> "If your client website has 1 million visitors a year, a usability issue
>> that
>> effects 10% of the users would be unlikely to be discovered on a test of
>> only 5 to 10 users, but would give 100,000 people a bad experience when
>> they
>> visit the site."
>>
>> Actually, that's not true. You'd be fairly likely to discover it with only
>> 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality control
>> systems and product quality testing have been using such statistical methods
>> since the 20's and they went through heavy refinement and sophistication in
>> the 60's, 70's and 80's.
>>
>> It's also worth repeating the message both Jakob & Jared Spool are
>> constantly talking about: test iteratively with a group of 5-10
>> participants. You'll find that 65%+ figure above rises to 99%+ in that case.
>>
>> Again, doesn't change your basic points about cultural diversity and
>> behaviour affecting the test parameters, but your above point is not
>> entirely accurate.
>>
>> Cheers
>> Steve
>>
>> 2009/10/2 James Page <[email protected]>
>>
>> It is dependent on how many issues there are, the cultural variance of
>>> your
>>> user base, and the margin of error you are happy with. Five users or even
>>> 10
>>> is not enough on a modern well designed web site.
>>>
>>> The easy way to think of a Usability Test is a treasure hunt. If the
>>> treasure is very obvious then you will need fewer people, if less obvious
>>> then you will need more people. If you increase the area of the hunt then
>>> you will need more people. Most of the advocates of only testing 5 to 10
>>> users, experience comes from one country. Behaviour changes significantly
>>> country by country, even in Western Europe. See my blog post here :
>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>>
>>> If your client website has 1 million visitors a year, a usability issue
>>> that
>>> effects 10% of the users would be unlikely to be discovered on a test of
>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>> they
>>> visit the site.
>>>
>>> Can you find treasure with only five or ten users. Of course you can. But
>>> how sure can you be that you have found even significant issues.
>>>
>>> A very good argument in why 10 is not enough is Woolrych and Cockton
>>> 2001.
>>> They point out an issue in Nielsen formula in that he does not take into
>>> account the visibility of an issue. They show using only 5 users can
>>> significantly under count even significant usability issues.
>>>
>>> The following powerpoint from an eyetracking study demonstrates the issue
>>> with only using a few users.
>>> http://docs.realeyes.it/why50.ppt
>>>
>>> You may also want to look at the margin of error for the test that you
>>> are
>>> doing.
>>>
>>> All the best
>>>
>>> James
>>> blog.feralabs.com
>>>
>>> 2009/10/1 Will Hacker <[email protected]>
>>>
>>> > Chris,
>>> >
>>> > There is not any statistical formula or method that will tell you the
>>> > correct number of people to test. In my experience it depends on the
>>> > functions you are testing, how many test scenarios you want to run
>>> > and how many of those can be done by one participant in one session,
>>> > and how many different levels of expertise you need (e.g. novice,
>>> > intermediate, and/or expert) to really exercise your application.
>>> >
>>> > I have gotten valuable insight from testing 6-10 people for ecommerce
>>> > sites with fairly common functionality that people are generally
>>> > familiar with but have used more for more complex applications where
>>> > there are different levels of features that some users rely on
>>> > heavily and others never use.
>>> >
>>> > I do believe that any testing is better than none, and realize you
>>> > are likely limited by time and budget. I think you can usually get
>>> > fairly effective results with 10 or fewer people.
>>> >
>>> > Will
>>> >
>>> >
>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>> > Posted from the new ixda.org
>>> > http://www.ixda.org/discuss?post=46278
>>> >
>>> >
>>> > ________________________________________________________________
>>> > Welcome to the Interaction Design Association (IxDA)!
>>> > To post to this list ....... [email protected]
>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>>> > List Guidelines ............ http://www.ixda.org/guidelines
>>> > List Help .................. http://www.ixda.org/help
>>> >
>>> ________________________________________________________________
>>> Welcome to the Interaction Design Association (IxDA)!
>>> To post to this list ....... [email protected]
>>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>>> List Guidelines ............ http://www.ixda.org/guidelines
>>> List Help .................. http://www.ixda.org/help
>>>
>>
>>
>>
>> --
>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn:
>> www.linkedin.com/in/stevebaty
>>
>
>


-- 
Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
[email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn:
www.linkedin.com/in/stevebaty
________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... [email protected]
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help

Re: [IxDA Discuss] Article on Number of Usability Test Participants

Reply via email to