On 15 Jun 2003 17:43:17 -0700, [EMAIL PROTECTED] (philipau) wrote: > Thanks for youre response Jay > > To clarify, the samples are from the same population. I am comparing > the mean value and the standard deviation of the number of suppliers > that tendered for a contract. The samples are contracts from > differenet worktypes. This means that the contracts in the population > are grouped arbitrarily by their department. However when I calculated > the standard deviations, the confidence interval is non-sensical.
Slow down, please. You mention Mean and Standard Deviation. The next step is left out, which would be the Standard Error (sometimes called the s.d. of the mean, though that tends to generate confusion). One does not usually put a Confidence Interval on the on the raw data -- which is what the SD would do. On the other hand, you do go on to describe data where the 'mean' seems like a non-representative and misleading quantity, for *many* purposes. > > As I know the population mean and standard deviation, I want to > compare them to the mean and standard deviation of the samples, > however I can't make sense of the data. The mean and standard > deviation of the population was 12 and 6.9 so a value 2 standard > deviations away would be non-sensical (negative value). Well, you don't have to be extremely skewed for the SD range to reach the negative. But the SD being greater than the non-negative mean does suggest that your 'random variation' or inherent error is really, really likely to be bigger at the bigger numbers. So, if you want to *test* for magnitude-differences between groups, rather than give numerical estimates, you are apt to want some transformation. > > Also lets say the mean was 3 suppliers, the standard deviation was 5. > Hence the confidence interval would go into negative numbers, which > doesn't make sense, as contracts can't have negative suppliers > tendering for the contract. > > Another factor that exasibated the problem was that the range of the > values varied greatly. One group of contracts had one contract with > 300+ suppliers, and 100+ contracts, whilst another group had only 2 > contracts with a range of 10 suppliers. Can I compare these groups? > >From here, I wonder if I'm parsing right. Does that say: One group had 100+ contracts, including *one* with 300+ suppliers; whilst another group had 2 contracts that altogether comprised only 10 suppliers. Is that an important difference, the number of *contracts*? If there are places of overlap - same sets of suppliers showing up on multiple contracts - you might not have the sort of 'independence' that robust testing would have. I think you looking at suppliers-per-contract? If group A has '100' contracts, and group B has '2' contracts, I should be happy to say, informally, that A has more than B. > I doubt I can use the empirical rule to standardize the distributions, > as I don't know if they are approximately normal. > > My guess is that because the population and some samples are greatly > skewed to the right, it causes these problems. > Do you really care about "averages" or the totals that they imply, or are you interested in "central tendency"? Or, would it be interesting to say that the number of contractors is split between "just 1; 2; 3 thru 10; 11 thru 100; 101 thru hi" ... ? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." Justice Holmes. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
