On 5 Aug 2003 05:53:15 -0700, [EMAIL PROTECTED] (Louis T) wrote:

[ snip, previous]

> I have more than 3000 bug reports. Understanding them all is as
> complex as the system is. This is why I want to sample it. Let's say
> that I will pick one report every 10 reports. This number is not fixed
> yet. My problem is here.

I still have great doubts about whether a statistical 
approach is useful.  It is going to depend, VERY  strongly,
on whether the 3000  errors can be treated as "independent"
of each other.   - They would be pretty much independent of
each other if they arose from 3000 programs written by 
3000 different people;  all in the same computer language,
or each in a different computer language;   ...  and maybe
a few other conditions would occur to me if I saw some 
numbers.

Or, they could be pretty much independent if all 3000
arose from the same computer program -- that is the other
way to aim for relative independence.   Again, you might
want to figure on all-from-one-programmer, or 3000 
programmers, if you want to assert independence.

> 
> What I have read about the chi square seems quite interesting. I would
> like to say that the conclusion deducted from my sample is :
> "I have a 90% probability that the real proportion of structural bug
> (category 4) is inside the interval 10% to 20%." This interval will be
> given by my sampling.
> I hope that my knowledge of english does not make this sentence to
> foggy (???).
> 

Okay.  I think you are asking about describing a small fraction,
and putting a Confidence Limit around it.  That part of the 
problem, the numeric part, is not too hard.

For a small proportion, we can consider the "counts"  to be
numbers that are distributed as Poisson:  And in that case,
the square root of the count is very  close to being "normal"
with standard error  of  1/2.  

Next step: the  usual 95%  CI  is built   by taking 
the mean,  +/-   twice the SE -- which would
thus be  +/-  1.0  added to the square root of the count.

Example:  If the Poisson count  (something under 20% of 
what was sampled)  is 25, then the 95%  CI  on counts 
is (16, 36)    since that is the range implied by 5.0 +/- 1.0  .

You write the CI  most readily on 'counts'  but it translates
directly to fractions.  It works as  25 out of 500, or out 
of 5000, or whatever.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization."  Justice Holmes.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to