Re: [R] Mixture of Normals with Large Data

Thomas Lumley Wed, 08 Aug 2007 08:06:43 -0700

On Wed, 8 Aug 2007, Martin Maechler wrote:

>>>>>> "BertG" == Bert Gunter <[EMAIL PROTECTED]>
>>>>>>     on Tue, 7 Aug 2007 16:18:18 -0700 writes:
>
>      TV> Have you considered the situation of wanting to
>      TV> characterize probability densities of prevalence
>      TV> estimates based on a complex random sample of some
>      TV> large population.
>
>    BertG> No -- and I stand by my statement. The empirical
>    BertG> distribution of the data themselves are the best
>    BertG> "characterization" of the density. You and others are
>    BertG> free to disagree.
>
> I do agree with you Bert.
>> From a practical point of view however, you'd still want to use an
> approximation to the data ECDF, since the full ecdf is just too
> large an object to handle conveniently.
>
> One simple quite small and probably sufficient such
> approximation maybe
> using the equivalent of quantile(x, probs = (0:1000)/1000)
> which is pretty related to just working with a binned version of
> the original data; something others have proposed as well.
>


I have done Normal (actually logNormal) mixture fitting to pretty large data 
(particle counts by size) for summary purposes.  In that case it would not have 
done just as well to use quantiles as I had many sets of data (every three 
hours for several months) and the locations of the mixture components drift 
around  over time.  The location, scale, and mass of the four mixture 
components really were the best summaries. This was the application that 
constrOptim() was written for.

      -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]       University of Washington, Seattle

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mixture of Normals with Large Data

Reply via email to