Hi Ed
I agree with your result (although you have a slight typo in your formula - the second term should be (x_i-xbar) - not (w_i - xbar)
To show this
Might as well assume that the sum of the weights is 1 (otherwise, easy to normalise them)
Then the result for the unbiased operator is s2 = (1/(1-sum(w*w)) * sum(w_i * (x_i-xbar)*(x_i-xbar)) (this is same as yours)
where as you used sum(w*w) is the sum of the squared weights. Note that if w_i=1/N then this reduces to the usual unbiased estimator of the variance.
To demonstrate above, first note that since weights are normalised, E(xbar) = E(sum(w_i * x_i) = E(x)
Note, we also must assume that the samples are independent.
E( sum(w_i * (x_i-xbar)*(x_i-xbar)) ) = E( sum(w_i * (x_i*x_i - 2*x_i*xbar + xbar*xbar)))
= sum(w_i * E(x**2)) - 2*sum(E(w_i*x_i*(sum(w_j*x_j)))) + sum(E(w_i*w_j*x_i*x_j))
= sum(w_i * E(x**2)) - sum(E(w_i*w_j*x_i*x_j))
= E(x**2) - sum(E(w_i*w_j*x_i*x_j)) ....(1)
2nd term on right hand side = sum_over_i_of(E(w_i*w_i*x_i*x_i)) + sum_for_i_not_equal_to_j_of(E(w_i*w_j*x_i*x_j))
= sum(w_i*w_i*E(x**2)) + sum_for_i_not_equal_to_j_of((E(x)**2) *w_i*w_j) (as x_i and x_j independent)
= E(x**2)*sum(w_i*w_i) + (1-sum(w_i*w_i))*(E(x)**2) (as 1 = sum(w_i)*sum(w_j) = sum(w_i*w_j))
replacing in (1) gives
E( sum(w_i * (x_i-xbar)*(x_i-xbar)) ) = E(x**2)*(1 - sum(w_i*w_i)) + (E(x)**2) * (1-sum(w_i*w_i))
= (1 - sum(w_i*w_i)) * s2
and so s2 = (1/(1-sum(w*w)) * sum(w_i * (x_i-xbar)*(x_i-xbar)) (2)
as required
if you assume that the weights are 1/N (all equal weights) then
1/(1-sum(w*w)) = 1/(1-1/N)= N/N-1, and since the w_i in equation 2 is 1/N, then (2) becomes
s2 = 1/(N-1) sum((x_i-xbar)*(x_i-xbar))
which is the usual unbiased estimate of the mean
JMP have simply stuck with the 1/N-1 term for denominator instead of correcting...
Best Regards
Colin Daly
-----Original Message-----
From: Edward Isaaks [mailto:[EMAIL PROTECTED]]
Sent: Fri 10/7/2005 1:26 AM
To: AI-GEOSTATS
Subject: [ai-geostats] How to Estimate Variance with Weighted Samples?
Hello List
My inquiry is quite straight forward. I require an unbiased estimate of variance using weighted samples. There are several equations commonly used to calculate an estimate of variance using weighted samples. But they are all slightly different and thus, they can't all be unbiased. Currently, my favorite equation is as follows:
1. Calculate a weighted estimate of the mean as xbar = Sum[(w_i * x_i)] / Sum[w_i], where x_i are sample values and w_i are the corresponding sample weights.
2. Calculate the denominator D = Sum[w] - Sum[w * w] / Sum[w], where Sum[w] is the sum of weights and Sum[w * w] is the sum of squared weights.
3. Now, I believe an unbiased estimate of the variance is given by s2 = Sum[ w_i * (x_i - xbar) * (w_i - xbar)] / D where xbar is the weighted estimate of the mean. Do you agree?
The thing that bothers me most is that JMP (an excellent EDA stat tool put out by SAS for those of you not familiar with JMP) calculates the weighted estimate of variance as follows: s2 = Sum[w_i * (x_i - xbar) * (x_i - xbar)] / N-1 where xbar is the weighted estimate of the mean. JMP Support insists that this equation is correct. However, it doesn't make any sense to me. Can anyone explain the theoretical basis or statistical model that might give some validity to this equation?
Thank you for your response.
Edward Isaaks.
DISCLAIMER: This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
* By using the ai-geostats mailing list you agree to follow its rules ( see http://www.ai-geostats.org/help_ai-geostats.htm )
* To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to [EMAIL PROTECTED] Signoff ai-geostats
