On 4 Mar 2003, James wrote (edited): > I have a dataset (positive count data) with bimodal shape. > My theory is that the sample is composed of samples drawn from two > different populations. I have another large dataset that I assume is > a sample of one of the subpopulation. The data again is counts of > occurrences (all positive and positively skewed, I wouldn't want to > assume any distributions at this time).
By "all positive", do you mean only "no zero counts", or "no values of zero in the data"? If the latter, is that also true of your bimodal data set? Is it reasonable that no zero values had been observed, or may such values have been excised from the data (with or without malice aforethought)? > My question is: can I derive some information about the other > subpopulation like mean and variance without further assumptions on > distributions of the populations or weights of how the bimodal > sample drawn from the two populations? I do not see how to do this easily. 1) Your data are counts. Unless you have reduced them to, say, proportions, the mean and variance will depend on the sample size. Some such reduction would be necessary, I suppose, even to compare values from your "other large dataset" with those from your bimodal sample. 2) Do you have any idea what might induce bimodality (e.g., the cample contains data from males and females, and it is reasonable to observe a systematic difference in mean number (or proportion) of counts for this variable), and can you segregate the data on the basis of such an identifier? (One suspects not, or you'd have mentioned it: the problem would be much easier from that approach.) 3) From (mean, variance, sample size) for two subsamples one can readily find (mean, variance) for the data set obtained by combining the two. To do it the other way round, you still need the two sample sizes: so you'd need to guess the ratio of sample sizes (which is what I suppose you meant by referring to "weights"). 4) It would be possible to make a start if you thought you knew the means of both subpopulations (and the variance of one of them, as you claim to have). But so far as I can see, your only information about different means arises from observing different modes; and you cannot estimate a mean from a mode (quite apart from the unreliability of modes as measures, in general!) without assuming something about the shape(s) of the distribution(s). This *might* be possible, given your second dataset and assuming that the two distributions you posit were similar in this regard. But then there's the problem that the modes you can observe are influenced by the presence of the other subsample: in particular, given the skewness you mention, the upper mode may have been shifted somewhat to the right, due to its distribution's having been superimposed (so to speak) on a sloping surface. [As you know, everyone is always looking for a level playing field...] I don't know if these thoughts will have been helpful. Good luck! ----------------------------------------------------------------------- Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
