Re: derive subpopulation mean and variance in bimodal data

Donald Burrill Wed, 05 Mar 2003 06:55:05 -0800

On 4 Mar 2003, James wrote (edited):

> I have a dataset (positive count data) with bimodal shape.
>  My theory is that the sample is composed of samples drawn from two
> different populations.  I have another large dataset that I assume is
> a sample of one of the subpopulation.  The data again is counts of
> occurrences (all positive and positively skewed, I wouldn't want to
> assume any distributions at this time).


By "all positive", do you mean only "no zero counts", or "no values of
zero in the data"?  If the latter, is that also true of your bimodal
data set?  Is it reasonable that no zero values had been observed, or
may such values have been excised from the data (with or without malice
aforethought)?

> My question is: can I derive some information about the other
> subpopulation like mean and variance without further assumptions on
> distributions of the populations or weights of how the bimodal
> sample drawn from the two populations?

I do not see how to do this easily.
 1)  Your data are counts.  Unless you have reduced them to, say,
proportions, the mean and variance will depend on the sample size.
Some such reduction would be necessary, I suppose, even to compare
values from your "other large dataset" with those from your bimodal
sample.
 2)  Do you have any idea what might induce bimodality (e.g., the cample
contains data from males and females, and it is reasonable to observe a
systematic difference in mean number (or proportion) of counts for this
variable), and can you segregate the data on the basis of such an
identifier?  (One suspects not, or you'd have mentioned it:  the problem
would be much easier from that approach.)
 3)  From (mean, variance, sample size) for two subsamples one can
readily find (mean, variance) for the data set obtained by combining the
two.  To do it the other way round, you still need the two sample sizes:
so you'd need to guess the ratio of sample sizes (which is what I
suppose you meant by referring to "weights").
 4)  It would be possible to make a start if you thought you knew the
means of both subpopulations (and the variance of one of them, as you
claim to have).  But so far as I can see, your only information about
different means arises from observing different modes;  and you cannot
estimate a mean from a mode (quite apart from the unreliability of modes
as measures, in general!) without assuming something about the shape(s)
of the distribution(s).  This *might* be possible, given your second
dataset and assuming that the two distributions you posit were similar
in this regard.  But then there's the problem that the modes you can
observe are influenced by the presence of the other subsample:  in
particular, given the skewness you mention, the upper mode may have been
shifted somewhat to the right, due to its distribution's having been
superimposed (so to speak) on a sloping surface.
 [As you know, everyone is always looking for a level playing field...]

I don't know if these thoughts will have been helpful.  Good luck!
 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: derive subpopulation mean and variance in bimodal data

Reply via email to