Hi Tim, "but for most practical purposes this is an irrelevant technicality"
Are you saying that I can treat each of the 4 estimates independently? That is, use sqrt(pq/N) as the std for each? seems problematic to me :) Yes, a Bayesian approach would be better, but this probably involves things like contour integration or other horrors. I hoped for something simpler. -Joseph On Wed, 13 Nov 2019 at 06:04, Timothy Y. Chow <[email protected]> wrote: > On Tue, 12 Nov 2019, Joseph Heled wrote: > > Hi Timothy, > > Here is a stats question I encounter from time to time. > > > > Suppose I run N BG games and collect the average win rates and gammon > > rates. > > 4 estimates which are dependent as they sum to 1. How do I determine > > the confidence intervals for each? This is a 4d vector and it seems > > like a non trivial Q, but I assume this crops up a lot and must have a > > standard answer. what is your take? > > > > Thanks, Joseph > > Joseph, > > I'm guessing that what you're really interested in is some measure of the > variation or dispersion of your sample dataset. In that case, you can > simply compute the sample standard deviation for each parameter of > interest. The fact that each sample consists of 4 numbers that satisfy > the equation that their sum equals 1 just means that your 4 estimated > standard deviations aren't independent estimates, but for most practical > purposes this is an irrelevant technicality. > > On the other hand, if you really want to compute a confidence interval for > the purposes of hypothesis testing, then you need to be explicit about > what your null hypothesis and alternative hypotheses are. If you're not > sure what your null and alternative hypotheses are, then to me that > confirms that what you're interested in is not hypothesis testing but some > sense of how good an estimate your averages are. > > It's important to realize that a 95% confidence interval does *not* mean > that there is a 95% probability that the quantity you're trying to > estimate lies in your interval. This is a common misconception about what > confidence intervals are. > > https://en.wikipedia.org/wiki/Confidence_interval#Misunderstandings > > If you really want to make statements of the form "there is a 95% > probability that the win rate is in such-and-such an interval" then you > need to adopt a Bayesian rather than a frequentist framework. In > particular you'll need to choose some prior probability distribution and > compute the posterior probability distribution by applying Bayes's rule to > your data. > > Tim
