On Fri, Sep 17, 2010 at 1:19 PM, <[email protected]> wrote: > On Fri, Sep 10, 2010 at 3:01 PM, <[email protected]> wrote: >> On Fri, Sep 10, 2010 at 1:58 PM, Christopher Barrington-Leigh >> <[email protected]> wrote: >>> Interesting. Thanks Erin, Josef and Keith. >> >> thanks to the stata page at least I figured out that WLS is aweights >> with asumption mu_i = mu >> >> import numpy as np >> from scikits.statsmodels import WLS >> w0 = np.arange(20) % 4 >> w = 1.*w0/w0.sum() >> y = 2 + np.random.randn(20) >> >>>>> res = WLS(y, np.ones(20), weights=w).fit() >>>>> print res.params, res.bse >> [ 2.29083069] [ 0.17562867] >>>>> m = np.dot(w, y) >>>>> m >> 2.2908306865128401 >>>>> s2u = 1/(nobs-1.) * np.dot(w, (y - m)**2) >>>>> s2u >> 0.030845429945278956 >>>>> np.sqrt(s2u) >> 0.17562867062435722 >> >> >>> >>> There is a nice article on this at >>> http://www.stata.com/support/faqs/stat/supweight.html. In my case, the >>> model I've in mind is to assume that the expected value (mean) is the same >>> for each sample, and that the weights are/should be normalised, whence a >>> consistent estimator for sem is straightforward (if second moments can >>> be assumed to be >>> well behaved?). I suspect that this (survey-like) case is also one of >>> the two most standard/most common >>> expression that people want when they ask for an s.e. of the mean for >>> a weighted dataset. The other would be when the weights are not to be >>> normalised, but represent standard errors on the individual >>> measurements. >>> >>> Surely what one wants, in the end, is a single function (or whatever) >>> called mean or sem which calculates different values for different >>> specified choices of model (assumptions)? And where possible that it has a >>> default model in mind for when none is specified? >> >> I find aweights and pweights still confusing, plus necessary auxillary >> assumptions. >> >> I don't find Stata docs very helpful, I almost never find a clear >> description of the formulas (and I don't have any Stata books). >> >> If you have or write some examples that show or apply in the different >> cases, then this would be very helpful to get a structure into this >> area, weighting and survey sampling, and population versus clustered >> or stratified sample statistics. >> >> I'm still pretty lost with the literature on surveys. > > I found the formula collection for SPSS > http://support.spss.com/productsext/statistics/documentation/19/clientindex.html#Manuals > pdf file for algorithms > > Not much explanation, and sometimes it's not really clear what a > variable stands for exactly, but a useful summary of formulas. Also > the formulas might not always be for a general case, e.g. formulas for > non-parametric tests seem to be missing tie-handling (from a quick > look). > > More compressed than the details descriptions in SAS, but much more > explicit than Stata and R without buying the books. > > The chapter on T Test Algorithm carries population frequencies > throughout, this should work for weighted statistics, but maybe not > for different complex sampling schemes (a-weights, p-weights,...).
here is the corresponding SAS documentation http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#ttest_toc.htm and here is my first draft of it in statsmodels, http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental-gsoc/annotate/head%3A/scikits/statsmodels/sandbox/stats/weightstats.py A few hours of translating the SPSS formula collection into sci-python, still buggy, insufficiently tested and insufficiently documented, and it doesn't have the extras yet that SAS has. It uses my currently preferred pattern of lazily evaluated classes (although I might switch decorators). Josef > > Josef > > >> >> Josef >> >> >>> >>> thanks, >>> Chris >>> >>> On Thu, Sep 9, 2010 at 9:13 PM, Keith Goodman <[email protected]> wrote: >>>> >>>> ma.std() >>>> >> 3.2548815339711115 >>>> > >>>> > or maybe `w` reflects an underlying sampling scheme and you should >>>> > sample in the bootstrap according to w ? >>>> >>>> Yes.... >>>> >>>> > if weighted average is a sum of linear functions of (normal) >>>> > distributed random variables, it still depends on whether the >>>> > individual observations have the same or different variances, e.g. >>>> > http://en.wikipedia.org/wiki/Weighted_mean#Statistical_properties >>>> >>>> ...lots of possibilities. As you have shown the problem is not yet >>>> well defined. Not much specification needed for the weighted mean, >>>> lots needed for the standard error of the weighted mean. >>>> >>>> > What I can't figure out is whether if you assume simga_i = sigma for >>>> > all observation i, do we use the weighted or the unweighted variance >>>> > to get an estimate of sigma. And I'm not able to replicate with simple >>>> > calculations what statsmodels.WLS gives me. >>>> >>>> My guess: if all you want is sigma of the individual i and you know >>>> sigma is the same for all i, then I suppose you don't care about the >>>> weight. >>>> >>>> > >>>> > ??? >>>> > >>>> > Josef >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> [email protected] >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
