Donald, thank you for your helpful post. I am now (since my second post) doing what I think you are suggesting I should do, rather than what I think you think I am doing (if that makes sense!). I'm NOT going to calculate the overall percentages from some overall mean and SD estimate. I'm taking each individual mean + SD, transforming it into a theoretical population, then adding all these populations together. (Though I'm working with the calculated psychometric functions, rather than the gaussian distributions)
The part I hadn't noticed (though it is staring me in the face) which you kindly point out is that, given several assumptions, when I've added together the data from several experiments at different ages, I get a flat(ish!) top with gaussian-like tails. None of the assumptions really holds (perfectly) for my data, but they're close. Finally, it's a very useful exercise for me! I have to answer this question: "What percentage of people from the general population will be able to hear this sound?" The data from individual age ranges are interesting, but the bottom line is an overall percentage. I'm not being asked what your 12 year old daughter can hear. I'm being asked about several million consumers. I must still appologise for where I'm using incorrect terminology (e.g. where I said the distribution WAS gaussian, as opposed to approximating a gaussian distribution). I will now go and read about the difference between percentile and percentage. Thanks for your help, David. http://www.David.Robinson.org/ [EMAIL PROTECTED] (Donald Burrill) wrote in message news:<[EMAIL PROTECTED]>... > A few comments, sprinkled within your post: > > On 2 Dec 2002, David Robinson wrote in part: > > > Each group for which I have all the data consists of a normal > > distribution. > > Strictly speaking, this is not true, although the distribution may > indeed by *approximately* normal. (Any normal distribution is > continuous and unbounded; but any real (that is, observed) distribution > is bounded above and below -- in a k-item test, one cannot get fewer > than 0 nor more than k items right -- and is discrete in so far as data > are reported to a finite number of digits.) It *may* be reasonable to > assume that the *latent* distribution underlying your data is > approximately normal, to within the precision of the data. > > > I was assuming that, [in combining data sets] I could get another > > normal distribution. I required the psychometric function of this > > combined data, [...] assuming normal distribution. > > > > However, even if all the individual groups exhibit a normal > > distribution (and I don't have data to prove this!), the combination > > certainly doesn't have to - and may not even be close if I choose to > > combine two groups with very different means (I believe this is one > > issue hinted at in the final post). > > If you are dealing with a measure suitable for students in elementary > and/or secondary schools, and if the measure is expected to change > (increase?) systematically with age, you should not expect a normal > distribution over a range of ages. If the true relationship is linear > with age (over some useful range of ages), the distribution of scores > would be expected to be approximately uniform (or rectangular) in the > middle of the range of scores, with tails at top and bottom that might > resemble normal tails (cut off from their parent distribution). > The shape I have described would follow from several assumptions that > are (or at any rate used to be) commonly made: > (a) the conditional distribution of test scores at a given age is > normal (aka Gaussian); > (b) the (conditional) mean score increases linearly with age; > (c) the population has been stable for some years, implying that the > subpopulations at each age are of (approximately) equal size. > > We could otherwise phrase that as expecting conditional normal > distributions when the population ditribution by age is rectangular and > the mean score is linear with age. > > > I have chosen to calculate the psychometric functions on a group by > > group basis. Thus, I have the %age of people in each group who would > > respond to a sound of a given intensity. Then, it is trivial to add > > the percentages across groups (at each intensity) weighting by the > > number of subjects in each group, or some other factor. I hope this > > isn't a terrible thing to do? > > Not so far; but THEN what are you going to do? Your sequel suggests > that what you really want is the cumulative frequency distribution for > all groups (ages?) combined, and that you intend to calculate > percentiles (NOT "%ages", although that's what you write) from the > overall mean and SD, *assuming normality*; rather than from the > empirical distribution that you get when you combine several putatively > normal distributions each from different ages (and therefore having > diffrerent means). > > I may add that it is not at all clear to me what utility may reside in > knowing the percentiles for the combined data. If we're dealing with > school children, nearly EVERYTHING that one does with children is > conditional on age. That is treated (by nearly everyone) as the most > important single datum about any child: if a child's letter to an > editor is published, it is ALWAYS accompanied by the child's age, for > example. If I am a parent, I don't much care that 80% of kids in school > grades 1 to 12 get scores at or below X: I want to know what scores are > reasonable for my 10-year old son and my 12-year-old daughter, and as a > person with some background in measurement & statistics I do not want > meaningless information (that X is the 80th percentile in a distribution > that purports to represent all kids from ages 6 to 18) to be presented > as though it meant something useful. > > > I want to combine the data from different groups because: > > a) I have data from experiments that should be the same, but aren't! > > b) I have data from different age ranges, which should be different > > (and are!), but I wish to calculate %age values for the whole > > population (with a given demographic, which I can create using correct > > weightings of each age-band data-set). > > If you do combine the data, are you not "sweeping under the rug" the > facts you have just mentioned (that some differences one would expect > were *not* observed, and that other differences one would expect *were* > observed)? > > > I sincerely apologise for asking the wrong question. If my chosen > > method is totally indefensible, I hope you will correct me. > > Well, as remarked above, much depends on just *what* this exercise is > *for*. So far it looks to me much like an academic exercise (in more > than one sense of the adjective!). But maybe I'm just being > curmudgeonly today. > -- DFB. > ----------------------------------------------------------------------- > Donald F. Burrill [EMAIL PROTECTED] > 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 > [was: 184 Nashua Road, Bedford, NH 03110 (603) 471-7128] > > . > . > ================================================================= > Instructions for joining and leaving this list, remarks about the > problem of INAPPROPRIATE MESSAGES, and archives are available at: > . http://jse.stat.ncsu.edu/ . > ================================================================= . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
