On Fri, 19 Oct 2001 11:58:18 +1000, "Glen Barnett" <[EMAIL PROTECTED]> wrote:
> > Rich Strauss <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED]... > > However, the arcsin transformation is for proportions (with fixed > > It's also designed for stabilising variance rather than specifically inducing > symmetry. > Does it actually produce symmetry as well? > > > denominator), not for ratios (with variable denominator). The "proportion > > of sentences in a number of texts that belong to a certain category" sounds > > like a problem in ratios, since the total number of sentences undoubtedly > > vary among texts. Log transformations work well because they linearize > > such ratios. > > Additionally for small proportions logs are close to logits, so logs are > sometimes helpful even if the data really are proportions. Logs also go > some way to reducing the skewness and stabilising the variance, though > they don't stabilise it as well as the arcsin square root that's specifically > designed for it. The transformation is okay but not great for proportions less than (say) 5%. Jez Hill followed up on a reference that gave him this summer, and posted further detail -- ============== from June 27, 2001, Jez Hill. Subject: Re: [Q ] transforming binomial proportions Newsgroups: sci.stat.math Rich Ulrich wrote in article [EMAIL PROTECTED]: > The fixed variance was the main appeal of the approximation, > "arcsin(sqrt(p))". [snip] > "A more accurate transformation for small n has been tabulated by > Mosteller and Youtz."[ Biometrika 48 (1961):433.] Thanks very much for that - it looks pretty good to me at n=500, 6<=np<=494 FYI: Following up on your reference, Mosteller and Youtz give the following formula from Freeman and Tukey [Ann. Math. Statist. 21(1950):607]. arcsin(sqrt( np/(n+1) ))/2 + arcsin(sqrt( (np+1)/(n+1) ))/2 which gives asymptotic variance 821/(n+0.5) "for a substantial range of p if n is not too small". I find that the improvement is quite significant, to the point where I would be quite happy to use it even for np=1, 2 or 3 at n=500, minor glitches in that region notwithstanding. =========== end of post -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================
