On Do, 2014-03-06 at 19:51 +0000, Nathaniel Smith wrote: > On Wed, Mar 5, 2014 at 4:45 PM, Sebastian Berg > <[email protected]> wrote: > > > > Hi all, > > > > in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe > > suggested adding new parameters to our `cov` and `corrcoef` functions to > > implement weights, which already exists for `average` (the PR still > > needs to be adapted). > > > > The idea right now would be to add a `weights` and a `frequencies` > > keyword arguments to these functions. > > > > In more detail: The situation is a bit more complex for `cov` and > > `corrcoef` than `average`, because there are different types of weights. > > The current plan would be to add two new keyword arguments: > > * weights: Uncertainty weights which causes `N` to be recalculated > > accordingly (This is R's `cov.wt` default I believe). > > * frequencies: When given, `N = sum(frequencies)` and the values > > are weighted by their frequency. > > I don't understand this description at all. One them recalculates N, > and the other sets N according to some calculation? > > Is there a standard reference on how these are supposed to be > interpreted? When you talk about per-value uncertainties, I start > imagining that we're trying to estimate a population covariance given > a set of samples each corrupted by independent measurement noise, and > then there's some natural hierarchical Bayesian model one could write > down and get an ML estimate of the latent covariance via empirical > Bayes or something. But this requires a bunch of assumptions and is > that really what we want to do? (Or maybe it collapses down into > something simpler if the measurement noise is gaussian or something?) >
I had really hoped someone who knows this stuff very well would show up ;). I think these weights were uncertainties under gaussian assumption and the other types of weights different, see `aweights` here: http://www.stata.com/support/faqs/statistics/weights-and-summary-statistics/, but I did not check a statistics book or have one here right now (e.g. wikipedia is less than helpful). Frankly unless there is some "obviously right" thing (for a statistician), I would be careful add such new features. And while I thought before that this might be the case, it isn't clear to me. - Sebastian > -n > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
