Re: [Numpy-discussion] weighted mean; weighted standard error of the mean (sem)

Keith Goodman Thu, 09 Sep 2010 21:13:52 -0700

On Thu, Sep 9, 2010 at 8:44 PM,  <[email protected]> wrote:
> On Thu, Sep 9, 2010 at 11:32 PM, Keith Goodman <[email protected]> wrote:
>> On Thu, Sep 9, 2010 at 8:07 PM, Keith Goodman <[email protected]> wrote:
>>> On Thu, Sep 9, 2010 at 7:22 PM, cpblpublic <[email protected]> 
>>> wrote:
>>>> I am looking for some reaally basic statistical tools. I have some
>>>> sample data, some sample weights for those measurements, and I want to
>>>> calculate a mean and a standard error of the mean.
>>>
>>> How about using a bootstrap?
>>>
>>> Array and weights:
>>>
>>>>> a = np.arange(100)
>>>>> w = np.random.rand(100)
>>>>> w = w / w.sum()
>>>
>>> Initialize:
>>>
>>>>> n = 1000
>>>>> ma = np.zeros(n)
>>>
>>> Save mean of each bootstrap sample:
>>>
>>>>> for i in range(n):
>>>   ....:     idx = np.random.randint(0, 100, 100)
>>>   ....:     ma[i] = np.dot(a[idx], w[idx])
>>>   ....:
>>>   ....:
>>>
>>> Error in mean:
>>>
>>>>> ma.std()
>>>   3.854023384833674
>>>
>>> Sanity check:
>>>
>>>>> np.dot(w, a)
>>>   49.231127299096954
>>>>> ma.mean()
>>>   49.111478821225127
>>>
>>> Hmm...should w[idx] be renormalized to sum to one in each bootstrap sample?
>>
>> Or perhaps there is no uncertainty about the weights, in which case:
>>
>>>> for i in range(n):
>>   ....:     idx = np.random.randint(0, 100, 100)
>>   ....:     ma[i] = np.dot(a[idx], w)
>>   ....:
>>   ....:
>>>> ma.std()
>>   3.2548815339711115
>
> or maybe `w` reflects an underlying sampling scheme and you should
> sample in the bootstrap according to w ?


Yes....

> if weighted average is a sum of linear functions of (normal)
> distributed random variables, it still depends on whether the
> individual observations have the same or different variances, e.g.
> http://en.wikipedia.org/wiki/Weighted_mean#Statistical_properties

...lots of possibilities. As you have shown the problem is not yet
well defined. Not much specification needed for the weighted mean,
lots needed for the standard error of the weighted mean.

> What I can't figure out is whether if you assume simga_i = sigma for
> all observation i, do we use the weighted or the unweighted variance
> to get an estimate of sigma. And I'm not able to replicate with simple
> calculations what statsmodels.WLS gives me.

My guess: if all you want is sigma of the individual i and you know
sigma is the same for all i, then I suppose you don't care about the
weight.

>
> ???
>
> Josef
>
>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] weighted mean; weighted standard error of the mean (sem)

Reply via email to