On Thu, Mar 6, 2014 at 2:51 PM, Nathaniel Smith <[email protected]> wrote: > On Wed, Mar 5, 2014 at 4:45 PM, Sebastian Berg > <[email protected]> wrote: >> >> Hi all, >> >> in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe >> suggested adding new parameters to our `cov` and `corrcoef` functions to >> implement weights, which already exists for `average` (the PR still >> needs to be adapted). >> >> The idea right now would be to add a `weights` and a `frequencies` >> keyword arguments to these functions. >> >> In more detail: The situation is a bit more complex for `cov` and >> `corrcoef` than `average`, because there are different types of weights. >> The current plan would be to add two new keyword arguments: >> * weights: Uncertainty weights which causes `N` to be recalculated >> accordingly (This is R's `cov.wt` default I believe). >> * frequencies: When given, `N = sum(frequencies)` and the values >> are weighted by their frequency. > > I don't understand this description at all. One them recalculates N, > and the other sets N according to some calculation? > > Is there a standard reference on how these are supposed to be > interpreted? When you talk about per-value uncertainties, I start > imagining that we're trying to estimate a population covariance given > a set of samples each corrupted by independent measurement noise, and > then there's some natural hierarchical Bayesian model one could write > down and get an ML estimate of the latent covariance via empirical > Bayes or something. But this requires a bunch of assumptions and is > that really what we want to do? (Or maybe it collapses down into > something simpler if the measurement noise is gaussian or something?)
I think the idea is that if you write formulas involving correlation or covariance using matrix notation, then these formulas can be generalized in several different ways by inserting some non-negative or positive diagonal matrices into the formulas in various places. The diagonal entries could be called 'weights'. If they are further restricted to sum to 1 then they could be called 'frequencies'. Or maybe this is too cynical and the jargon has a more standard meaning in this context. _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
