On Fri, Sep 10, 2010 at 1:58 PM, Christopher Barrington-Leigh <[email protected]> wrote: > Interesting. Thanks Erin, Josef and Keith.
thanks to the stata page at least I figured out that WLS is aweights with asumption mu_i = mu import numpy as np from scikits.statsmodels import WLS w0 = np.arange(20) % 4 w = 1.*w0/w0.sum() y = 2 + np.random.randn(20) >>> res = WLS(y, np.ones(20), weights=w).fit() >>> print res.params, res.bse [ 2.29083069] [ 0.17562867] >>> m = np.dot(w, y) >>> m 2.2908306865128401 >>> s2u = 1/(nobs-1.) * np.dot(w, (y - m)**2) >>> s2u 0.030845429945278956 >>> np.sqrt(s2u) 0.17562867062435722 > > There is a nice article on this at > http://www.stata.com/support/faqs/stat/supweight.html. In my case, the > model I've in mind is to assume that the expected value (mean) is the same > for each sample, and that the weights are/should be normalised, whence a > consistent estimator for sem is straightforward (if second moments can > be assumed to be > well behaved?). I suspect that this (survey-like) case is also one of > the two most standard/most common > expression that people want when they ask for an s.e. of the mean for > a weighted dataset. The other would be when the weights are not to be > normalised, but represent standard errors on the individual > measurements. > > Surely what one wants, in the end, is a single function (or whatever) > called mean or sem which calculates different values for different > specified choices of model (assumptions)? And where possible that it has a > default model in mind for when none is specified? I find aweights and pweights still confusing, plus necessary auxillary assumptions. I don't find Stata docs very helpful, I almost never find a clear description of the formulas (and I don't have any Stata books). If you have or write some examples that show or apply in the different cases, then this would be very helpful to get a structure into this area, weighting and survey sampling, and population versus clustered or stratified sample statistics. I'm still pretty lost with the literature on surveys. Josef > > thanks, > Chris > > On Thu, Sep 9, 2010 at 9:13 PM, Keith Goodman <[email protected]> wrote: >> >>>> ma.std() >> >> 3.2548815339711115 >> > >> > or maybe `w` reflects an underlying sampling scheme and you should >> > sample in the bootstrap according to w ? >> >> Yes.... >> >> > if weighted average is a sum of linear functions of (normal) >> > distributed random variables, it still depends on whether the >> > individual observations have the same or different variances, e.g. >> > http://en.wikipedia.org/wiki/Weighted_mean#Statistical_properties >> >> ...lots of possibilities. As you have shown the problem is not yet >> well defined. Not much specification needed for the weighted mean, >> lots needed for the standard error of the weighted mean. >> >> > What I can't figure out is whether if you assume simga_i = sigma for >> > all observation i, do we use the weighted or the unweighted variance >> > to get an estimate of sigma. And I'm not able to replicate with simple >> > calculations what statsmodels.WLS gives me. >> >> My guess: if all you want is sigma of the individual i and you know >> sigma is the same for all i, then I suppose you don't care about the >> weight. >> >> > >> > ??? >> > >> > Josef > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
