James,

More good points. I did some calculations a while back on the confidence
intervals for pass/fail user tests -
http://www.meld.com.au/2006/05/when-100-isnt-really-100-updated - the more
interesting part being the link to a paper on estimators of expected values.
Worth a read if you haven't seen it.

I'll try to dig up the more recent paper - working from memory on that one.

Regarding the anthropology & sociology references - I was referring more to
the notion of uncovering societal norms rather than the specific 'supporting
a sample size of x'.

Coming back to your first point: yeah, the use of the .31 is a
simplification for the sake of one of his free articles; it's a modal figure
based on (his words) a large number of projects. So, looking at a range of
figures, you would have some projects where more users were needed (to your
earlier point), and in some cases - few - you could get away with less
(although I admit that the use of less than 5 participants causes me some
concern).

Anyway, enjoying the discussion, and I still think we're violently in
agreement on the basic point :)

Cheers
Steve

2009/10/2 James Page <[email protected]>

> Steve,
>
> Woolrych and Cockton argue that the discrepancy is Nielsen's constant of
> .31. Neilson assumes all issues have the same visibility. We have not even
> added the extra dimension of evaluator effect :-)
>
> Do you have a reference for the more resent paper? I would be interested in
> reading it.
>
> On the manufacturing side most of the metrics use a margin of error. With
> just 10 users your margin of error will be about +/-35% (very rough
> calculation). That is far better than no test, but still would be considered
> extremely low in a manufacturing process.
>
> In Anthropology most of papers I have read use far greater sample sizes
> than just a population of 10. Yes it depends on the subject mater. The
> Anthropologist will use techniques like using informers, which increases the
> number of participants. And the Anthropologist is studying the population
> over months if not years, so there are far more observations.
>
> @thomas testing the wireframe will only show up what is already visible.
> But if a feature has an issue, and it is implemented in the wireframe, then
> a test will show it up. Discovering an issue early is surely better than
> later. I think your statement iterates the idea that testing frequently is a
> good idea.
>
> All the best
>
> James
> blog.feralabs.com
>
>
> 2009/10/2 Steve Baty <[email protected]>
>
>> James,
>>
>> Excellent points.
>>
>> Nielsen argues that 5 users will discover 84% of the issues; not that the
>> likelihood of finding a particular issue is 84% - thus the discrepancy in
>> our figures (41% & 65% respectively).
>>
>> (And I can't believe I'm defending Nielsen's figures, but this is one of
>> his better studies) The results from '93 were re-evaluated more recently for
>> Web-based systems with similar results. There's also some good theory on
>> this from sociology and cultural anthropology - but I think we're moving far
>> afield from the original question.
>>
>> Regarding the manufacturing reference - which I introduced, granted -
>> units tend to be tested in batches for the reason you mention. The presence
>> of defects in a batch signals a problem and further testing is carried out.
>>
>> I also like the approach Amazon (and others) take in response to your last
>> point, which is to release new features to small (for them) numbers of users
>> - 1,000, then 5,000 etc - so that these low-incidence problems can surface.
>> When the potential impact is high, this is a really solid approach to take.
>>
>> Regards
>>
>> Steve
>>
>> 2009/10/2 James Page <[email protected]>
>>
>>> Steve,
>>>
>>> The real issue is that the example I have given is that it is over
>>> simplistic. It is dependent on sterile lab conditions, and the user
>>> population been the same in the lab and in the real world. And there only
>>> being one issue that effects 10% of the user population. One of the great
>>> beauties of the world is the complexity and diversity of people. In the
>>> sterile lab people are tested on the same machine (we have found machine
>>> configuration such as screen size has a bearing on behaviour), and they
>>> don't have the distractions that normally effect the user in the real
>>> world.
>>>
>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>> only 5-10 users - in the 65%+ range of 'likely'.
>>>>
>>>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is
>>> far off from Nielson number that 5 users will find 84% of the issues.
>>> (1-(1-0.31)^5)
>>>
>>> If I was manufacturing and there was a 45% chance that 10% of my cars
>>> leave the production line with a fault, there is a high chance that
>>> consumers would stop buying my product, the company would go bust, and I
>>> would be out a job. From my experience of production lines a sample size of
>>> 10 for a production of one million units would be considered extremely low.
>>>
>>> We have moved allong way since 1993 when Nielsen and Landauer's paper was
>>> published. The web was not arround, and the profile of users was very
>>> different. The web has changed that. We will need to test with more people
>>> as websites traffic increases, and we get better at web site design. For
>>> example if we assume that designers of a web site have been using good
>>> design principles and therefore an issue only effects 2.5% of users. Then 10
>>> users in a test will only discover that issue 22% of the time. But using our
>>> 1 million visitors a year example the issue will mean that 25,000 people
>>> will experience problems.
>>>
>>> But we do agree that each population needs it's own test. And I totally
>>> agree that testing iteratively is a good idea.
>>>
>>> @William --  Woolrych and Cockton 2001 argument applies to simple task
>>> based tests. See
>>> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>
>>>
>>> All the best
>>>
>>> James
>>> blog.feralabs.com
>>>
>>> PS (*Disclaimer*) Due to my belief that usability testing needs not just
>>> to be more statistically sound, but also be able to test a wide range of
>>> users from different cultures I co-founded www.webnographer.com a remote
>>> usability testing tool. So I am advocate for testing with more
>>> geographically diverse users than normal lab tests.
>>>
>>> 2009/10/2 Steve Baty <[email protected]>
>>>
>>> "If your client website has 1 million visitors a year, a usability issue
>>>> that
>>>> effects 10% of the users would be unlikely to be discovered on a test of
>>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>>> they
>>>> visit the site."
>>>>
>>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality
>>>> control systems and product quality testing have been using such 
>>>> statistical
>>>> methods since the 20's and they went through heavy refinement and
>>>> sophistication in the 60's, 70's and 80's.
>>>>
>>>> It's also worth repeating the message both Jakob & Jared Spool are
>>>> constantly talking about: test iteratively with a group of 5-10
>>>> participants. You'll find that 65%+ figure above rises to 99%+ in that 
>>>> case.
>>>>
>>>> Again, doesn't change your basic points about cultural diversity and
>>>> behaviour affecting the test parameters, but your above point is not
>>>> entirely accurate.
>>>>
>>>> Cheers
>>>> Steve
>>>>
>>>> 2009/10/2 James Page <[email protected]>
>>>>
>>>> It is dependent on how many issues there are, the cultural variance of
>>>>> your
>>>>> user base, and the margin of error you are happy with. Five users or
>>>>> even 10
>>>>> is not enough on a modern well designed web site.
>>>>>
>>>>> The easy way to think of a Usability Test is a treasure hunt. If the
>>>>> treasure is very obvious then you will need fewer people, if less
>>>>> obvious
>>>>> then you will need more people. If you increase the area of the hunt
>>>>> then
>>>>> you will need more people. Most of the advocates of only testing 5 to
>>>>> 10
>>>>> users, experience comes from one country. Behaviour changes
>>>>> significantly
>>>>> country by country, even in Western Europe. See my blog post here :
>>>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>>>>
>>>>> If your client website has 1 million visitors a year, a usability issue
>>>>> that
>>>>> effects 10% of the users would be unlikely to be discovered on a test
>>>>> of
>>>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>>>> they
>>>>> visit the site.
>>>>>
>>>>> Can you find treasure with only five or ten users. Of course you can.
>>>>> But
>>>>> how sure can you be that you have found even significant issues.
>>>>>
>>>>> A very good argument in why 10 is not enough is Woolrych and Cockton
>>>>> 2001.
>>>>> They point out an issue in Nielsen formula in that he does not take
>>>>> into
>>>>> account the visibility of an issue. They show using only 5 users can
>>>>> significantly under count even significant usability issues.
>>>>>
>>>>> The following powerpoint from an eyetracking study demonstrates the
>>>>> issue
>>>>> with only using a few users.
>>>>> http://docs.realeyes.it/why50.ppt
>>>>>
>>>>> You may also want to look at the margin of error for the test that you
>>>>> are
>>>>> doing.
>>>>>
>>>>> All the best
>>>>>
>>>>> James
>>>>> blog.feralabs.com
>>>>>
>>>>> 2009/10/1 Will Hacker <[email protected]>
>>>>>
>>>>> > Chris,
>>>>> >
>>>>> > There is not any statistical formula or method that will tell you the
>>>>> > correct number of people to test. In my experience it depends on the
>>>>> > functions you are testing, how many test scenarios you want to run
>>>>> > and how many of those can be done by one participant in one session,
>>>>> > and how many different levels of expertise you need (e.g. novice,
>>>>> > intermediate, and/or expert) to really exercise your application.
>>>>> >
>>>>> > I have gotten valuable insight from testing 6-10 people for ecommerce
>>>>> > sites with fairly common functionality that people are generally
>>>>> > familiar with but have used more for more complex applications where
>>>>> > there are different levels of features that some users rely on
>>>>> > heavily and others never use.
>>>>> >
>>>>> > I do believe that any testing is better than none, and realize you
>>>>> > are likely limited by time and budget. I think you can usually get
>>>>> > fairly effective results with 10 or fewer people.
>>>>> >
>>>>> > Will
>>>>> >
>>>>> >
>>>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>>>> > Posted from the new ixda.org
>>>>> > http://www.ixda.org/discuss?post=46278
>>>>> >
>>>>> >
>>>>> > ________________________________________________________________
>>>>> > Welcome to the Interaction Design Association (IxDA)!
>>>>> > To post to this list ....... [email protected]
>>>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>> > List Guidelines ............ http://www.ixda.org/guidelines
>>>>> > List Help .................. http://www.ixda.org/help
>>>>> >
>>>>> ________________________________________________________________
>>>>> Welcome to the Interaction Design Association (IxDA)!
>>>>> To post to this list ....... [email protected]
>>>>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>> List Guidelines ............ http://www.ixda.org/guidelines
>>>>> List Help .................. http://www.ixda.org/help
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>>>> [email protected] | Twitter: docbaty | Skype: steve_baty |
>>>> LinkedIn: www.linkedin.com/in/stevebaty
>>>>
>>>
>>>
>>
>>
>> --
>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>> [email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn:
>> www.linkedin.com/in/stevebaty
>>
>
>


-- 
Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
[email protected] | Twitter: docbaty | Skype: steve_baty | LinkedIn:
www.linkedin.com/in/stevebaty
________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... [email protected]
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help

Reply via email to