Totally agree with your article....

> So you can get a much narrower range for your estimate, but 30+ users is a
> significant undertaking for a usability test.
>
One of our own findings from a study was that  people got bored with testing
more than about 8 users.

James

2009/10/2 Steve Baty <[email protected]>

> James,
>
> More good points. I did some calculations a while back on the confidence
> intervals for pass/fail user tests -
> http://www.meld.com.au/2006/05/when-100-isnt-really-100-updated - the more
> interesting part being the link to a paper on estimators of expected values.
> Worth a read if you haven't seen it.
>
> I'll try to dig up the more recent paper - working from memory on that one.
>
> Regarding the anthropology & sociology references - I was referring more to
> the notion of uncovering societal norms rather than the specific 'supporting
> a sample size of x'.
>
> Coming back to your first point: yeah, the use of the .31 is a
> simplification for the sake of one of his free articles; it's a modal figure
> based on (his words) a large number of projects. So, looking at a range of
> figures, you would have some projects where more users were needed (to your
> earlier point), and in some cases - few - you could get away with less
> (although I admit that the use of less than 5 participants causes me some
> concern).
>
> Anyway, enjoying the discussion, and I still think we're violently in
> agreement on the basic point :)
>
>
> Cheers
> Steve
>
> 2009/10/2 James Page <[email protected]>
>
>> Steve,
>>
>> Woolrych and Cockton argue that the discrepancy is Nielsen's constant of
>> .31. Neilson assumes all issues have the same visibility. We have not even
>> added the extra dimension of evaluator effect :-)
>>
>> Do you have a reference for the more resent paper? I would be interested
>> in reading it.
>>
>> On the manufacturing side most of the metrics use a margin of error. With
>> just 10 users your margin of error will be about +/-35% (very rough
>> calculation). That is far better than no test, but still would be considered
>> extremely low in a manufacturing process.
>>
>> In Anthropology most of papers I have read use far greater sample sizes
>> than just a population of 10. Yes it depends on the subject mater. The
>> Anthropologist will use techniques like using informers, which increases the
>> number of participants. And the Anthropologist is studying the population
>> over months if not years, so there are far more observations.
>>
>> @thomas testing the wireframe will only show up what is already visible.
>> But if a feature has an issue, and it is implemented in the wireframe, then
>> a test will show it up. Discovering an issue early is surely better than
>> later. I think your statement iterates the idea that testing frequently is a
>> good idea.
>>
>> All the best
>>
>> James
>> blog.feralabs.com
>>
>>
>> 2009/10/2 Steve Baty <[email protected]>
>>
>>> James,
>>>
>>> Excellent points.
>>>
>>> Nielsen argues that 5 users will discover 84% of the issues; not that the
>>> likelihood of finding a particular issue is 84% - thus the discrepancy in
>>> our figures (41% & 65% respectively).
>>>
>>> (And I can't believe I'm defending Nielsen's figures, but this is one of
>>> his better studies) The results from '93 were re-evaluated more recently for
>>> Web-based systems with similar results. There's also some good theory on
>>> this from sociology and cultural anthropology - but I think we're moving far
>>> afield from the original question.
>>>
>>> Regarding the manufacturing reference - which I introduced, granted -
>>> units tend to be tested in batches for the reason you mention. The presence
>>> of defects in a batch signals a problem and further testing is carried out.
>>>
>>> I also like the approach Amazon (and others) take in response to your
>>> last point, which is to release new features to small (for them) numbers of
>>> users - 1,000, then 5,000 etc - so that these low-incidence problems can
>>> surface. When the potential impact is high, this is a really solid approach
>>> to take.
>>>
>>> Regards
>>>
>>> Steve
>>>
>>> 2009/10/2 James Page <[email protected]>
>>>
>>>> Steve,
>>>>
>>>> The real issue is that the example I have given is that it is over
>>>> simplistic. It is dependent on sterile lab conditions, and the user
>>>> population been the same in the lab and in the real world. And there only
>>>> being one issue that effects 10% of the user population. One of the great
>>>> beauties of the world is the complexity and diversity of people. In the
>>>> sterile lab people are tested on the same machine (we have found machine
>>>> configuration such as screen size has a bearing on behaviour), and they
>>>> don't have the distractions that normally effect the user in the real
>>>> world.
>>>>
>>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>>> only 5-10 users - in the 65%+ range of 'likely'.
>>>>>
>>>>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This
>>>> is far off from Nielson number that 5 users will find 84% of the issues.
>>>> (1-(1-0.31)^5)
>>>>
>>>> If I was manufacturing and there was a 45% chance that 10% of my cars
>>>> leave the production line with a fault, there is a high chance that
>>>> consumers would stop buying my product, the company would go bust, and I
>>>> would be out a job. From my experience of production lines a sample size of
>>>> 10 for a production of one million units would be considered extremely low.
>>>>
>>>> We have moved allong way since 1993 when Nielsen and Landauer's paper
>>>> was published. The web was not arround, and the profile of users was very
>>>> different. The web has changed that. We will need to test with more people
>>>> as websites traffic increases, and we get better at web site design. For
>>>> example if we assume that designers of a web site have been using good
>>>> design principles and therefore an issue only effects 2.5% of users. Then 
>>>> 10
>>>> users in a test will only discover that issue 22% of the time. But using 
>>>> our
>>>> 1 million visitors a year example the issue will mean that 25,000 people
>>>> will experience problems.
>>>>
>>>> But we do agree that each population needs it's own test. And I totally
>>>> agree that testing iteratively is a good idea.
>>>>
>>>> @William --  Woolrych and Cockton 2001 argument applies to simple task
>>>> based tests. See
>>>> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>
>>>>
>>>> All the best
>>>>
>>>> James
>>>> blog.feralabs.com
>>>>
>>>> PS (*Disclaimer*) Due to my belief that usability testing needs not
>>>> just to be more statistically sound, but also be able to test a wide range
>>>> of users from different cultures I co-founded www.webnographer.com a
>>>> remote usability testing tool. So I am advocate for testing with more
>>>> geographically diverse users than normal lab tests.
>>>>
>>>> 2009/10/2 Steve Baty <[email protected]>
>>>>
>>>> "If your client website has 1 million visitors a year, a usability issue
>>>>> that
>>>>> effects 10% of the users would be unlikely to be discovered on a test
>>>>> of
>>>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>>>> they
>>>>> visit the site."
>>>>>
>>>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality
>>>>> control systems and product quality testing have been using such 
>>>>> statistical
>>>>> methods since the 20's and they went through heavy refinement and
>>>>> sophistication in the 60's, 70's and 80's.
>>>>>
>>>>> It's also worth repeating the message both Jakob & Jared Spool are
>>>>> constantly talking about: test iteratively with a group of 5-10
>>>>> participants. You'll find that 65%+ figure above rises to 99%+ in that 
>>>>> case.
>>>>>
>>>>> Again, doesn't change your basic points about cultural diversity and
>>>>> behaviour affecting the test parameters, but your above point is not
>>>>> entirely accurate.
>>>>>
>>>>> Cheers
>>>>> Steve
>>>>>
>>>>> 2009/10/2 James Page <[email protected]>
>>>>>
>>>>> It is dependent on how many issues there are, the cultural variance of
>>>>>> your
>>>>>> user base, and the margin of error you are happy with. Five users or
>>>>>> even 10
>>>>>> is not enough on a modern well designed web site.
>>>>>>
>>>>>> The easy way to think of a Usability Test is a treasure hunt. If the
>>>>>> treasure is very obvious then you will need fewer people, if less
>>>>>> obvious
>>>>>> then you will need more people. If you increase the area of the hunt
>>>>>> then
>>>>>> you will need more people. Most of the advocates of only testing 5 to
>>>>>> 10
>>>>>> users, experience comes from one country. Behaviour changes
>>>>>> significantly
>>>>>> country by country, even in Western Europe. See my blog post here :
>>>>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>>>>>
>>>>>> If your client website has 1 million visitors a year, a usability
>>>>>> issue that
>>>>>> effects 10% of the users would be unlikely to be discovered on a test
>>>>>> of
>>>>>> only 5 to 10 users, but would give 100,000 people a bad experience
>>>>>> when they
>>>>>> visit the site.
>>>>>>
>>>>>> Can you find treasure with only five or ten users. Of course you can.
>>>>>> But
>>>>>> how sure can you be that you have found even significant issues.
>>>>>>
>>>>>> A very good argument in why 10 is not enough is Woolrych and Cockton
>>>>>> 2001.
>>>>>> They point out an issue in Nielsen formula in that he does not take
>>>>>> into
>>>>>> account the visibility of an issue. They show using only 5 users can
>>>>>> significantly under count even significant usability issues.
>>>>>>
>>>>>> The following powerpoint from an eyetracking study demonstrates the
>>>>>> issue
>>>>>> with only using a few users.
>>>>>> http://docs.realeyes.it/why50.ppt
>>>>>>
>>>>>> You may also want to look at the margin of error for the test that you
>>>>>> are
>>>>>> doing.
>>>>>>
>>>>>> All the best
>>>>>>
>>>>>> James
>>>>>> blog.feralabs.com
>>>>>>
>>>>>> 2009/10/1 Will Hacker <[email protected]>
>>>>>>
>>>>>> > Chris,
>>>>>> >
>>>>>> > There is not any statistical formula or method that will tell you
>>>>>> the
>>>>>> > correct number of people to test. In my experience it depends on the
>>>>>> > functions you are testing, how many test scenarios you want to run
>>>>>> > and how many of those can be done by one participant in one session,
>>>>>> > and how many different levels of expertise you need (e.g. novice,
>>>>>> > intermediate, and/or expert) to really exercise your application.
>>>>>> >
>>>>>> > I have gotten valuable insight from testing 6-10 people for
>>>>>> ecommerce
>>>>>> > sites with fairly common functionality that people are generally
>>>>>> > familiar with but have used more for more complex applications where
>>>>>> > there are different levels of features that some users rely on
>>>>>> > heavily and others never use.
>>>>>> >
>>>>>> > I do believe that any testing is better than none, and realize you
>>>>>> > are likely limited by time and budget. I think you can usually get
>>>>>> > fairly effective results with 10 or fewer people.
>>>>>> >
>>>>>> > Will
>>>>>> >
>>>>>> >
>>>>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>>>>> > Posted from the new ixda.org
>>>>>> > http://www.ixda.org/discuss?post=46278
>>>>>> >
>>>>>> >
>>>>>> > ________________________________________________________________
>>>>>> > Welcome to the Interaction Design Association (IxDA)!
>>>>>> > To post to this list ....... [email protected]
>>>>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>>> > List Guidelines ............ http://www.ixda.org/guidelines
>>>>>> > List Help .................. http://www.ixda.org/help
>>>>>> >
>>>>>> ________________________________________________________________
>>>>>> Welcome to the Interaction Design Association (IxDA)!
>>>>>> To post to this list ....... [email protected]
>>>>>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>>> List Guidelines ............ http://www.ixda.org/guidelines
>>>>>> List Help .................. http://www.ixda.org/help
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 |
>>>>> E: [email protected] | Twitter: docbaty | Skype: steve_baty |
>>>>> LinkedIn: www.linkedin.com/in/stevebaty
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>>> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn:
>>> www.linkedin.com/in/stevebaty
>>>
>>
>>
>
>
> --
> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn:
> www.linkedin.com/in/stevebaty
>
________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... [email protected]
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help

Reply via email to