Well, I think should step back and ask a few design questions concerning the objects that will use these Sample/Population variances and that will assist us in their own design.

1.) Is it the case that a covariance matrix could be built off of "either" Sample or Population Variances?

2.) Are there other applications of Sample/Pop Variances which we want to implement, if so what are they? Are they interchangeable in these cases?

3.) Do we want to add methods to the Descriptive/Summary/StatUtils stats to capture both cases?

What this and the Remedian case are somewhat convincing me of is that, in the SummaryStatistics case; you need to know what your want before you start adding values to the Statistic, which constitutes a sort of configuration environment, while in the "DescriptiveStatistics" case, one can choose these aspects afterward, as the statistic is calculated after all the values are known.

This means that you either have to calculate both the PopulationVariance and SampleVariance in the SummaryStatistics case, or configure it to use one or the other. While in the DescriptiveStatistics case, you can just call the appropriate method to return that statistic.

-Mark

Hi MArk,

I think we have to think very carefully about this.
Especially when we start including covariances. My old
textbooks give the formula as population estimates,
just like excell (no choice, only population).
However, covariance matrices include the sample
covariances....

Cheers,

Kim

Phil Steitz wrote:
Mark R. Diggory wrote:

Yes, at the UnivariateStatistic level, these would need to be new classes. My question as well is "Does it apply as well to higher order moments?"


In theory, yes, though I have never seen non-bias-corrected versions of Skewness and Kurtosis used. The current formulas are all defined for the most common use case where the data represent a sample from a population whose true distribution and associated parameters are unknown.population The formulas that we use provide unbiased estimators for population parameters in this case. This is explained fairly well for the Variance here:
http://mathworld.wolfram.com/Variance.html
and for Skewness and Kurtosis here:
http://mathworld.wolfram.com/k-Statistic.html


The "Population Variance" is useful when the data *are* the population (i.e. the distribution is discrete and there is no sampling going on). I am not aware of use cases where Skewness and Kurtosis are useful in analyzing full population data or other uses for the non-bias-corrected versions of these. These could exist, I am just not aware of them.


Maybe we should place everything into the following packages:


I don't think we need yet another subpackage.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-- Mark Diggory Software Developer Harvard MIT Data Center http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to