On Fri, Sep 10, 2010 at 3:01 PM, <[email protected]> wrote: > On Fri, Sep 10, 2010 at 1:58 PM, Christopher Barrington-Leigh > <[email protected]> wrote: >> Interesting. Thanks Erin, Josef and Keith. > > thanks to the stata page at least I figured out that WLS is aweights > with asumption mu_i = mu > > import numpy as np > from scikits.statsmodels import WLS > w0 = np.arange(20) % 4 > w = 1.*w0/w0.sum() > y = 2 + np.random.randn(20) > >>>> res = WLS(y, np.ones(20), weights=w).fit() >>>> print res.params, res.bse > [ 2.29083069] [ 0.17562867] >>>> m = np.dot(w, y) >>>> m > 2.2908306865128401 >>>> s2u = 1/(nobs-1.) * np.dot(w, (y - m)**2) >>>> s2u > 0.030845429945278956 >>>> np.sqrt(s2u) > 0.17562867062435722 > > >> >> There is a nice article on this at >> http://www.stata.com/support/faqs/stat/supweight.html. In my case, the >> model I've in mind is to assume that the expected value (mean) is the same >> for each sample, and that the weights are/should be normalised, whence a >> consistent estimator for sem is straightforward (if second moments can >> be assumed to be >> well behaved?). I suspect that this (survey-like) case is also one of >> the two most standard/most common >> expression that people want when they ask for an s.e. of the mean for >> a weighted dataset. The other would be when the weights are not to be >> normalised, but represent standard errors on the individual >> measurements. >> >> Surely what one wants, in the end, is a single function (or whatever) >> called mean or sem which calculates different values for different >> specified choices of model (assumptions)? And where possible that it has a >> default model in mind for when none is specified? > > I find aweights and pweights still confusing, plus necessary auxillary > assumptions. > > I don't find Stata docs very helpful, I almost never find a clear > description of the formulas (and I don't have any Stata books). > > If you have or write some examples that show or apply in the different > cases, then this would be very helpful to get a structure into this > area, weighting and survey sampling, and population versus clustered > or stratified sample statistics. > > I'm still pretty lost with the literature on surveys.
I found the formula collection for SPSS http://support.spss.com/productsext/statistics/documentation/19/clientindex.html#Manuals pdf file for algorithms Not much explanation, and sometimes it's not really clear what a variable stands for exactly, but a useful summary of formulas. Also the formulas might not always be for a general case, e.g. formulas for non-parametric tests seem to be missing tie-handling (from a quick look). More compressed than the details descriptions in SAS, but much more explicit than Stata and R without buying the books. The chapter on T Test Algorithm carries population frequencies throughout, this should work for weighted statistics, but maybe not for different complex sampling schemes (a-weights, p-weights,...). Josef > > Josef > > >> >> thanks, >> Chris >> >> On Thu, Sep 9, 2010 at 9:13 PM, Keith Goodman <[email protected]> wrote: >>> >>>> ma.std() >>> >> 3.2548815339711115 >>> > >>> > or maybe `w` reflects an underlying sampling scheme and you should >>> > sample in the bootstrap according to w ? >>> >>> Yes.... >>> >>> > if weighted average is a sum of linear functions of (normal) >>> > distributed random variables, it still depends on whether the >>> > individual observations have the same or different variances, e.g. >>> > http://en.wikipedia.org/wiki/Weighted_mean#Statistical_properties >>> >>> ...lots of possibilities. As you have shown the problem is not yet >>> well defined. Not much specification needed for the weighted mean, >>> lots needed for the standard error of the weighted mean. >>> >>> > What I can't figure out is whether if you assume simga_i = sigma for >>> > all observation i, do we use the weighted or the unweighted variance >>> > to get an estimate of sigma. And I'm not able to replicate with simple >>> > calculations what statsmodels.WLS gives me. >>> >>> My guess: if all you want is sigma of the individual i and you know >>> sigma is the same for all i, then I suppose you don't care about the >>> weight. >>> >>> > >>> > ??? >>> > >>> > Josef >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
