Excerpts from cpblpublic's message of Thu Sep 09 22:22:05 -0400 2010: > I am looking for some reaally basic statistical tools. I have some > sample data, some sample weights for those measurements, and I want to > calculate a mean and a standard error of the mean. > > Here are obvious places to look: > > numpy > scipy.stats > statsmodels > > It seems to me that numpy's "mean" and "average" functions have their > names backwards. That is, often a mean is defined more generally than > average, and includes the possibility of weighting, but in this case > it is "average" that has a weights argument. Can these functions be > merged/renamed/deprecated in the future? It's clear to me that "mean" > should allow for weights. > > None of these modules, above, offer standard error of the mean which > incorporates weights. scipy.stats.sem() doesn't, and that's the closest > thing. numpy's "var" doesn't allow weights. > There aren't any weighted variances in the above modules. > > Again, are there favoured codes for these functions? Can they be > incorporated appropriately in the future? > > Most immediately, I'd love to get code for weighted sem. I'll write it > otherwise, but it might be crude and dumb... > Thanks! > Chris Barrington-Leigh > UBC
This code below should do what you want. It is part of the stat sub-package of esutil http://code.google.com/p/esutil/ Hope this helps, Erin Scott Sheldon Brookhaven National Laboratory def wmom(arrin, weights_in, inputmean=None, calcerr=False, sdev=False): """ NAME: wmom() PURPOSE: Calculate the weighted mean, error, and optionally standard deviation of an input array. By default error is calculated assuming the weights are 1/err^2, but if you send calcerr=True this assumption is dropped and the error is determined from the weighted scatter. CALLING SEQUENCE: wmean,werr = wmom(arr, weights, inputmean=None, calcerr=False, sdev=False) INPUTS: arr: A numpy array or a sequence that can be converted. weights: A set of weights for each elements in array. OPTIONAL INPUTS: inputmean: An input mean value, around which them mean is calculated. calcerr=False: Calculate the weighted error. By default the error is calculated as 1/sqrt( weights.sum() ). If calcerr=True it is calculated as sqrt( (w**2 * (arr-mean)**2).sum() )/weights.sum() sdev=False: If True, also return the weighted standard deviation as a third element in the tuple. OUTPUTS: wmean, werr: A tuple of the weighted mean and error. If sdev=True the tuple will also contain sdev: wmean,werr,wsdev REVISION HISTORY: Converted from IDL: 2006-10-23. Erin Sheldon, NYU """ # no copy made if they are already arrays arr = numpy.array(arrin, ndmin=1, copy=False) # Weights is forced to be type double. All resulting calculations # will also be double weights = numpy.array(weights_in, ndmin=1, dtype='f8', copy=False) wtot = weights.sum() # user has input a mean value if inputmean is None: wmean = ( weights*arr ).sum()/wtot else: wmean=float(inputmean) # how should error be calculated? if calcerr: werr2 = ( weights**2 * (arr-wmean)**2 ).sum() werr = numpy.sqrt( werr2 )/wtot else: werr = 1.0/numpy.sqrt(wtot) # should output include the weighted standard deviation? if sdev: wvar = ( weights*(arr-wmean)**2 ).sum()/wtot wsdev = numpy.sqrt(wvar) return wmean,werr,wsdev else: return wmean,werr _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
